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Singapore’s salad days are over 


Uncertainty has replaced confidence as economic reality bites science in the city-state and 
scientists find that their research funds now come with strings attached. 


hen Neal Copeland and Nancy Jenkins, a renowned 
W/ resant.an nt team of cancer geneticists, left the 

US National Cancer Institute in Bethesda, Maryland, for 
the Institute of Molecular and Cell Biology in Singapore in 2006, 
they joined a string of star names in the city-state that suggested its 
remarkable investment in research was paying off. Generous funds 
have flowed to science in Singapore for the best part of a decade, and 
researchers from around the world have followed. Drawn by hefty 
salaries and enviable working conditions, they have rapidly given Sin- 
gapore an international presence. The Genome Institute of Singapore, 
for example, has asserted itself as one of the most important basic 
genomics research organizations in the world. 

Best of all for scientists, despite Singapore's reputation for top-down 
autocracy, its investment in research came with relatively few strings 
attached. The administration realized that researchers prefer to have 
the freedom to follow their curiosity and that, to attract the best minds, 
they needed to loosen the reins. As a result, Singapore’s biomedical 
infrastructure seems set to enter the next stage in its development, in 
which researchers looking for their next posts — especially the much- 
sought promising young researchers and postdoctoral students — are 
starting to consider Singapore, not only because of the large grants, but 
also because of its scientific reputation and intellectual ferment. 

To many outsiders, the Singapore experiment seemed too good 
to be true — and perhaps it was. Singapore is not immune to the 
economic pressure mounting on research communities around the 
world, and policy-makers everywhere want returns on their invest- 
ments. Rumours of purse-tightening measures have grown over the 
past year, but researchers in the city-state were still stunned by the 
news in September that almost one-third of the total research budget 
will be abruptly shifted to competitive ‘industrial alignment funds. 
Access to that funding will now depend on researchers’ abilities to 
show that their work has industrial applications. The policy will affect 
all research but is aimed particularly at the biomedical sciences, which 
are senior figures feel are not pulling their weight. 

Nobody should cry for Singapore's scientists, who don't expect sym- 
pathy. They have been living large and will continue, if they can prove 
themselves, to be paid generously. And having to write grant applica- 
tions is not enslavement — it is the norm for most researchers around 
the world. The problem is not Singapore's shifting priorities, but how 
the government is implementing the change. 

In response to a call for research proposals last month, Singapore's 
scientists have had to scramble to draft application-oriented propos- 
als. They know that industrial contracts would help. But, given the 
shaky state of the global pharmaceutical industry, such contracts are 
not easy to come by. Many applications are going in with a weak note: 
“industrial partner to be decided”. Singapore's scientists worry that, 
given only weeks or months to secure deals, they will be forced into 
unfavourable agreements. One researcher at Singapore’s Agency for 


Science, Technology and Research says that the policy is an attempt to 

turn the agency “into a contract-research organization overnight”. 
Researchers also worry that the government has not made clear 
how it will review the sudden influx of research applications. Singa- 
pore has used external review committees to audit its institutes in the 
past. But reviewing individual grants is a different and much more 
labour-intensive procedure if done properly. Will Singapore be forced 
to rely ona small number of bureaucrats and 


“To many selected scientists for reviews? Frustrated 
outsiders, the by the changes, Copeland and Jenkins have 
Singapore decided to leave Singapore. Many other sci- 
experiment entists there are also looking for new posts. 

seemed too good The government should move quickly to 


clarify the grant-review process. Easing the 
industrial-application restrictions would 
help scientists in the short term. More fun- 
damentally, as researchers have suggested, 
the government could phase in the funding changes over the next few 
years, rather than introducing them all at once. 

Singapore's rapid transformation came about through massive, per- 
haps even excessive, funding. The move to align scientific objectives 
with economic reality is understandable. But it would be a huge waste 
if doing so with undue haste and insufficient planning were to destroy 
Singapore's impressive experiment. = 


to be true — and 
perhaps it was.” 


Animal instinct 


Germany must better explain the scientific use 
of animals to remain a major biomedical force. 


selves in a particularly hostile environment. A campaign of 
intimidation and violence by animal-rights extremists had spun 

out of control. The London-based lobby group Understanding Animal 
Research — a historic organization founded in 1908 — responded with 
a counter-campaign of its own that, in 2005, smoothed the introduc- 
tion of laws giving the police increased powers to stop extremists from 
harassing scientists and from harming animal-research organizations. 
Scientists in Germany have not yet experienced such a degree of vio- 
lence, although the potential is there. In one incident in Munich, activists 
rented billboard space to display the name, home address and telephone 
number ofa scientist whose research involved animals. In another, they 
distributed flyers describing a local researcher as a killer and torturer. 
Similar or worse incidents have occurred in other cities such as Bremen 


r | Yen years ago, researchers using animals in Britain found them- 
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and Tubingen, where biomedical research is particularly strong. 

Biomedical scientists in Germany perceive a separate crisis — 
increasing legislative restrictions that make it more difficult to carry 
out animal experiments. Hearing little to the contrary from research- 
ers themselves, the public tends to assume that animal experiments are 
an unnecessary evil, so politicians respond with more restrictions. 

That problem was a major motivation for the Basel Declaration — 
drafted and signed at a meeting in Basel, Switzerland, last week 
(see page 742). Its signatories pledge to engage in open debate with 
the public about their work on animal experiments, to stress the high 
ethical standards to which they adhere and to explain why they have 
to do it. They intend, for example, to visit local schools or to mention 
that their research used animals when speaking to the press about 
new results. Germany does not have a national organization such as 
Understanding Animal Research to manage and maintain this public 
outreach. Now is the time that it should. 

During the UK crisis, Understanding Animal Research used the 
momentum of the campaign against extremist violence to engage Brit- 
ish scientists to establish a public dialogue. Education on the medical 
value of animal research helped to dismantle knee-jerk public sympathy 
for animal-rights campaigns and encouraged politicians to act. 

In the 1990s, the pharmaceutical industry largely abandoned 
Germany as a research base, put off by restrictions on genetic tech- 
nologies and the use of animals. Today the country is a leader in 
biomedical research, and red tape around genetic technologies has 
been significantly reduced. Yet the animal issue remains sensitive. 
Scientists there have tended to keep their heads down and hope 
for the best. But they should fear the worst: a crisis such as that 
in Britain could arise at any time. Some of the five main German 


research organizations, such as the Max Planck Society, which runs 
80 research institutes in different disciplines, and the DFG, 
Germany's main research-granting agency, acknowledge the animal- 
experiment problem and have small offices that monitor legislative 

activities. But they do not engage in significant public outreach. 
The solution must be a single, non-partisan national office that can 
implement the principles of the Basel Decla- 


The animal ration. It need not be large — Understanding 
issue hasan Animal Research has only nine staff — but 
intrinsicall it needs to be professional. Busy researchers 
emotiven ae do not have the time or the lobbying skills to 
andis never organize long-term concerted action. Who 

° should pay? When it comes to the defence of 
going to goaway. 


research in Germany in general, the research 
organizations and universities band together 
as the formidable Alliance of German Science Organizations — the 
‘Allianz. Successive governments have deferred to it, and have commit- 
ted to long-term funding increases even in times of financial crisis. 

The Allianz is the appropriate body to create and fund a German 
organization analogous to Understanding Animal Research. Including 
industry might breed distrust. The Basel Declaration has shown that 
animal researchers in Germany are willing to go public. A small invest- 
ment by all members of the Allianz would bolster this new solidarity 
and serve as insurance for Germany’s biomedical effort and for the 
new biomedical industries that are springing up. 

The animal issue has an intrinsically emotive nature and is never 
going to go away. To keep the public ignorant of the benefits of animal 
research — without which it is currently impossible to develop any 
new therapies — was never a solution. = 


Give up the ghosts 


Funding agencies should make researchers 
revealindustry links. 


been found all over academic publications. Documents released 

last week by a watchdog group based in Washington DC 
raise concerns about the role of writers paid by GlaxoSmithKline 
(GSK) in works attributed to psychiatric researchers at a number of 
US institutions. They add to the drumbeat of allegations in recent 
years indicating that such ghostwriting — in which articles contain 
substantial portions written by someone who is not listed as an author 
— is endemic in the biomedical literature. 

The documents were made available as a result of litigation over 
GSK’s antidepressant Paxil (paroxetine) and were pounced on by 
the Project on Government Oversight, which raised concerns about 
authorship of a research article, journal editorial and textbook. 

The researchers did acknowledge the alleged ghostwriters of the text- 
book and the editorial in notes, but only for “editorial support”. For the 
journal article, which appeared in a supplement to Psychopharmacology 
Bulletin, GSK is thanked for an “unrestricted educational grant”. But the 
original front page of the manuscript — which the academic author is 
instructed to remove before submission to the journal — declares that 
it was prepared by writers from Scientific Therapeutics Information, 
a company based in Springfield, New Jersey, hired by GSK. The arti- 
cle and textbook discuss the uses of Paxil. The editorial, in Biological 
Psychiatry, gives an overview of depression as a major and growing 
public-health problem — which certainly does no harm to a company 
aggressively marketing an antidepressant. 

The academic authors and the American Psychiatric Associa- 
tion, which published the textbook, have strongly denied that the 


T* spectral fingerprints of a big drug company have once again 
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pharmaceutical giant influenced its content. So, too, have the authors 
of the editorial and the journal article. GSK shareholders, then, may 
wonder what the company got for its money. The issue here is not that 
industry-financed experts cannot write useful and unbiased reports, 
but that their role must be declared in full. It is for readers, not authors, 
to conclude that there is no conflict of interest. 

All the academic authors involved in this case have been recipients 
of US National Institutes of Health (NIH) funding; all but one still 
are. The NIH may argue, rightly, that the ghostwritten publications 
did not use its money. It will also note, correctly, that this in an issue 
that demands far broader action. Both are beside the point. Money is 
fungible, and rarely do the studies and intellectual output of senior 
researchers divide neatly into industry-funded and taxpayer-funded 
work. If its grantees are not playing by the rules, the NIH is tarred and 
public trust is damaged. So, how clear are the rules on ghostwriting? 
A study last year found that just 10 out of 50 top US academic medical 
centres had explicit, web-accessible policies that prohibit the practice. 
Another three banned ghostwriting in practice without naming it as 
such (J. R. Lacasse and J. Leo PLoS Med. 7, e1000230; 2010). 

Discussing the issue of ghostwriting a year ago, Francis Collins, the 
NIH director, said publicly that he was “shocked” that “people would 
allow their names to be used on articles they did not write, that were 
written for them, particularly by companies that have something to 
gain by the way the data is presented”. Many will share that shock, 
but, unlike Collins, few are in a position to do something about it. 
The agency is “considering how best to address and ensure” greater 
transparency and accountability as its grantees develop and author 
articles, Sally Rockey, NIH chief of extramural research, told Nature 
in an e-mail last week. 

A good start would be for the NIH to require all institutions that 
take its funds to articulate, publicize and vigor- 
ously enforce a clear ban on ghostwriting. Other 
funders should follow suit. Without such a clear 
signal, and the willingness to give a ban teeth, 
this troubling ghost will linger at the feast. m 
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WORLD VIEW pernsicos sen 


espite decades of awareness, science is still inherently sexist. 

Women are vastly under-represented in professorships and in 

national academies worldwide. This is a familiar problem, but 
less highlighted is how the discrepancy plays out in the public arena 
of science — the media. 

Male science pundits dominate television, radio and print — 
including the pages of opinion and comment in this journal. This 
imbalance cannot simply be explained by the shortage of female 
professors, as many male pundits are still at an early stage of their 
academic careers, when genders are better balanced. So what is 
behind this effective invisibility of women scientists in our media? 
And why does it matter? 

Many people think that women themselves are to blame for the 
male-dominated media, in science and other 
subjects. Women, who often bear the brunt 
of domestic obligations, are said to have less 
time than men to participate in activities out- 
side their work. And female colleagues tend to 
say that they do not feel eminent or qualified 
enough to comment. Perhaps this points to 
a question of confidence — one that does not 
seem to bother most men. Women may also be 
uncomfortable with the cut and thrust of con- 
flict and debate. Indeed, at scientific seminars I 
have attended, most of the questions come from 
men, despite the audience usually containing an 
equal number of women. Voicing one’s opinion 
in a public arena is a charged activity that seems 
to discourage many women, yet this is precisely 
the skill that a good pundit needs. 

This still cannot explain the near-total 
absence of women pundits. Sexism must be responsible too. Having 
both the inclination and the time to do media work myself, I have 
certainly found myself dropped for programmes and replaced by 
less-qualified men. A prominent television producer once refused to 
put a colleague on screen because, he said, people wouldn't swallow 
science offered from “a young, blonde girl” like her. I have voiced 
opinions during panel discussions to little effect, then watched a 
man next to me say the same thing to widespread applause. In group 
discussions, I find that women are often talked over by men as if they 
weren't even in the room, whereas men are more likely to let other 
men finish their sentences. More insidiously, it is well documented 
that what passes for spirited assertion in men is interpreted, by both 
sexes, as unpleasant aggression in women. Given this bias, I under- 
stand why many women might prefer not to 


get involved. > NATURE.COM 
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Women scientists 
must speak out 


Female researchers still battle sexism. The media gives them an opportunity to 
be heard alongside male colleagues, says Jennifer Rohn. 


visibility in the public arena. First and most importantly, women 
need to speak up. They could start in the relative safety of their own 
academic departments, preferably during their PhD studies. It is not 
easy — a famous female professor recently admitted to me that she 
still gets palpitations when asking questions at high-profile academic 
seminars — but nerves never killed anyone. Work through them, 
and you will gain respect as someone who has intelligent things to 
say and is not afraid to share them. Verbal sparring at seminars can 
also help your career because it builds confidence, develops an abil- 
ity to communicate ideas and can even lead to collaborations. (And, 
palpitations aside, it gets easier with practise.) From speaking out at 
seminars, I found it natural to progress to media work, which, as well 
as being challenging and enjoyable, hones your powers of analysis 
and persuasion — skills that are useful for all 
scientists, regardless of sex. 

Second, keep in mind that, to the media and 
its audience, you dont have to be an eminent 
professor to have a valuable opinion — any PhD 
student or postdoc is miles ahead of the public 
in terms of scientific knowledge. Start a blog 
about your own research to refine your opinions 
and develop a style. As you gain more research 
experience, give your name and telephone 
number to your institution’s press office, and 
don’t shy away if asked. Similarly, don’t be afraid 
to stray from your specialized niche of research 
expertise: if you are reasonably well read ona 
general topic, your opinion will still be useful. 
It is important to participate, because if we 
scientists aren't ready to step into the gap at short 
notice, the press may choose someone who isn’t 
qualified at all — a real problem when the story is about homeopathy 
or other quackery. 

Some might question if it matters whether we have more female 
science pundits, as long as the men are doing the job well. I think it 
does. A female messenger could attract a more diverse crowd, includ- 
ing other women. The point of punditry is often to persuade people 
that science is worthwhile and, more to the point, deserves funding. 
Also, pundits help to put forward scientific recommendations and 
counter misinformation. When it comes to controversial issues such 
as climate change, childhood vaccinations or genetically modified 
food, we need as many people as possible to hear and engage with our 
arguments. Women should stand shoulder to shoulder with their male 
colleagues to make this happen. = 


Jennifer Rohn is a cell biologist at University College London and 
editor of LabLit.com. Her most recent book is The Honest Look (Cold 
Spring Harbor Laboratory Press). 

e-mail: jenny@lablit.com 
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RESEARCH HIGHLIGHTS 


J. MARC-MOHAMED 


Stars reach 
their limits 


Stars seem to have a size limit 
of 100-150 times that of the 
Sun; the reason for this has 
been the subject of debate. 
Some astronomers think that 
a large star’s own radiation 
blows away the gas it needs to 
grow, whereas others suggest 
that a star’s progenitor cloud 
fragments. 

A new simulation bolsters 
the fragmentation theory. 
Thomas Peters at the 
University of Heidelberg in 
Germany and his colleagues 
simulated the birth of massive 
stars in a cloud of gas and 
found that other, smaller stars 
formed from the fragmenting 
gas before the largest one 
could grow too big. The 
simulations are supported by 
some observations, and should 
lead to a better understanding 
of how big stars form. 
Astrophys. J. 725, 134-145 (2010) 


Scorpions glow 
to sense 


Scorpions fluorescence under 
ultraviolet (UV) light may 
help them to detect and avoid 
the light. Because night-time 
levels of UV light correlate 
with the Moon phase, this 
could enable the creatures to 
detect moonlight and remain 
obscured on moonlit nights. 
Carl Kloock and his team at 
California State University in 


Bacteria that thrive on arsenic 


A bacterium discovered ina lake high in arsenic 
not only metabolizes the normally toxic element, 
but also seems to incorporate it into its DNA and 
other molecules in place of phosphorus. This 
hints at a biochemistry very different from that 
long thought to underlie life on Earth. 

Felisa Wolfe-Simon at the US Geological 
Survey in Menlo Park, California, and her 
colleagues found the microbe in California’s 
Mono Lake (pictured). When cultured in 
arsenate with only trace amounts of phosphate, 
the organism grew at a rate equal to 60% of 


Bakersfield reduced the glow 
of 15 female Paruroctonus 
becki scorpions (pictured) by 
exposing them to 16 hours of 
low-level UV light per day. The 
authors placed the creatures, 
along with 15 control, 
fluorescing scorpions, in Petri 
dishes that were painted black 
across one half. The scorpions 
were then exposed to infrared 
(IR) light only, IR and UV, or 
IR and white light. 

The team found that, when 
exposed to UV light, the 
fluorescent scorpions were 
less active than the reduced- 
fluorescent ones, moving less 
often between the light and 
dark parts of the Petri dishes. 
J. Arachnol. 38, 441-445 (2010) 
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Antiseptic silver 
slivers 


Silver is toxic to bacteria, and 
nanoparticles of the element 
offer promise as a coating for 
medical devices. But silver 
nanoparticles readily oxidize 
and clump together, losing 
their antibacterial activity. 
Chunhai Fan at the Chinese 
Academy of Sciences in 
Shanghai and his colleagues 
solved this problem by growing 
the particles on biocompatible 
silicon nanowires. This avoids 
the need for toxic or expensive 
chemicals to stabilize the silver 
nanoparticles. 


that it achieves in phosphate. 

Using radiolabelling and mass spectroscopy, 
the team found arsenic in cellular fractions of 
the bacterium’s proteins, lipids, metabolites 
and nucleic acids in amounts similar to 
those expected for phosphate in normal cell 
biochemistry. X-ray analysis suggested that the 
arsenic takes the form of arsenate, and bonds 
with carbon and oxygen similarly to phosphate. 
Science doi:10.1126/science.1197258 (2010) 
For details of the mixed reactions to this 
surprising finding see p.741. 


The researchers show that 
exposure to a 10% solution of 
silver-coated nanowires froze 
population size in the bacteria 
Escherichia coli and Bacillus 
subtilis throughout a two-day 
test period. 

Adv. Mater. doi:10.1002/ 
adma.201001934 (2010) 


Taking wing ona 
beam of light 


A beam of light has been used 
to provide lift to a micrometre- 
sized curved rod in a manner 
analogous to that by which air 
passing over a wing provides 
lift to birds and aeroplanes. 


H. BORTMAN 


WILEY 


Grover Swartzlander at 
the Rochester Institute of 
Technology in New York and 
his colleagues shone a weakly 
focused laser beam through 
the roughly semi-cylindrical 
rods, which refracted the 
light rays. This refraction 
changed the direction of the 
rays momentum, causing 
an equal and opposite 
momentum change on the 
rods themselves. Because of 
the rods’ asymmetrical shape, 
the momentum shift was 
directed more towards one 
side, driving the rods upwards 
at around 2.5 micrometres 
per second. 

The researchers suggest 
that the technique could be 
used to transport microscopic 
machines through liquids and 
to help to steer solar sails in 
spacecraft. 

Nature Photon. doi:10.1038/ 
nphoton.2010.266 (2010) 

For a longer story on this 
research, see go.nature.com/ 
ye4zid. 


| ___NEUROSCIENCE 
Enzyme helps pain 
persist 


Pain perception in mice is 
maintained for several days 
by augmented activity of a 
particular enzyme in a brain 
area associated with chronic 
pain. Blocking the enzyme, 
PKM{, with a peptide inhibitor 
alleviates the pain. 

Bong-Kiun Kaang at 
Seoul National University, 
Min Zhuo at the University 
of Toronto in Canada and 
their colleagues show that in 
the days following a nerve 
injury, mice had higher 
levels of PKMC in a brain 
region known as the anterior 
cingulate cortex (ACC). 
Injecting the PKMC inhibitor 
into the ACC led to a drop in 
synaptic activity, or neuronal 
communication, in that 
region, as well as a decrease 
in pain responses. 

The authors suggest that 
PKMC mediates chronic 
pain by boosting synaptic 
transmission in the ACC. 
Science 330, 1400-1404 
(2010) 


ECOLOGY 


Reptiles rose 
after forests died 


The disappearance of vast 
tracts of tropical forest some 
305 million years ago led to 
an explosion in the global 
diversity of reptiles and 
amphibians, thanks to the 
emergence of many new, 
fragmented habitats. 
Howard Falcon-Lang at 
Royal Holloway, University 
of London, in Surrey, UK, 
and his colleagues compared 
the distribution and diversity 
of these animals in the fossil 
record. During the period they 
studied, climate change dried 
up equatorial rainforests in the 
land mass that later became 
Europe and North America. 
Many of the species that lived 
across these forests became 
extinct, and were replaced by 
a wealth of different types of 
reptile and amphibian that were 
particular to isolated habitats. 
Amphibians, which depend on 
aquatic environments, fared 
less well than reptiles, which 
were able to adapt to a drier 
world. 
Geology 38, 1079-1082 (2010) 


Ring-a-ring 0’ 
benzene 


Loops of interconnected 
benzene rings have long 
fascinated chemists, who 
have now developed a more 
flexible way to string the rings 
together. 

Kenichiro Itami and 
his co-workers at Nagoya 
University in Japan created the 
loop, or cycloparaphenylene 
(CPP, pictured below left), 
by first coupling L-shaped 
and linear molecules to 
form U-shaped ones. They 
then combined two of these 


RESEARCH HIGHLIGHTS MiiiSaiaae 


COMMUNITY 
CHOICE 


Full immunity needed to fight cancer 


> HIGHLY READ 
on www.cell.com 
in November 


Certain targeted cancer drugs shrink 
tumours by shutting down key genes. But 
researchers report that this may not be 


enough to vanquish cancer — a functional 
immune system is also a pre-requisite. 

Immune cells are known to be important in restricting 
tumour formation, but less is known about their role in tumour 
regression. Dean Felsher at Stanford University in California 
and his team switched off genes required for tumour growth in 
mouse models of lymphoma and leukaemia. They found that 
the rate of tumour shrinkage fell when the mice lacked an intact 
immune system — to up to one-thousandth of the normal 
speed — and the frequency of tumour recurrence rose. 

The team discovered that immune cells called CD4" T cells 
are needed to shut down blood-vessel growth and to trigger 
tumour-cell senescence. Moreover, a protein produced by the 
T cells called thrombospondin 1, which blocks blood-vessel 
formation, seems to be key to fending off tumours. 


Cancer Cell 18, 485-498 (2010) 


molecules to form the desired 
O-shaped CPP. 

This modular approach 
has potential for producing 
specific CPPs of any size 
greater than 13 benzene rings. 
The loops could be useful 
for fabricating single-walled 
carbon nanotubes of specific 
diameter, by growing the loops 
horizontally (pictured right). 
Angew. Chem. Int. Edn 
doi:10.1002/anie.201005734 
(2010) 


Speaking in 
borrowed tongues 


Languages evolve ina similar 
way to biological organisms, 
with ancestral languages 
splitting into descendent ones. 
In language evolution, ‘lexical 
borrowing, whereby a word is 
transferred from one language 
to another, is also common. 
Linguists have struggled to 
distinguish between words 
that have descended and those 
that have been borrowed. 

Tal Dagan at the Heinrich 
Heine University in 
Diisseldorf, Germany, and her 


colleagues looked for instances 
of borrowing by analysing the 
relationships between 2,346 
words of basic vocabulary 
with similar meanings from 84 
Indo-European languages. By 
studying networks of related 
words, the researchers found 
that, on average, 8% of the 
basic vocabulary in each of the 
languages is borrowed. Basic 
vocabulary was previously 
assumed to be fairly immune 
to borrowing. 

Proc. R. Soc. B doi: 10.1098/ 
rspb.2010.1917 (2010) 


CORRECTION 

In “The source of sour taste” 
(Nature 468, 603; 2010), 
the mice with tagged bitter, 
sweet and umami taste 
cells were tested by Liman 
etal. but engineered by 
another lab. Furthermore, 
in response to acids, sour- 
taste cells did not conduct 
sodium ions, which were 
previously thought to 
mediate sour sensing. 


© NATURE.COM 
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Nikumaroro Island, part of the Republic of Kiribati, is surrounded by waters rich in tropical tuna species. 


Islands champion tuna ban 


Pacific nations to restrict fishing across a vast swathe of international waters. 


BY CHRISTOPHER PALA 


bold move by eight Pacific island 
Az to preserve the world’s last 
large stocks of tuna is expected to face 
strong resistance this week at a meeting of the 
Western and Central Pacific Fisheries Com- 
mission (WCPFC) near Honolulu, Hawaii. 
By leveraging agreements with foreign 
fishermen in their own territorial waters, the 
islands have banned fleets that fish with purse- 
seine nets — mechanized nets that can capture 
entire schools of tuna in a single haul — from 
operating in a region of international waters 
roughly the size of India. The area, known 
as the Eastern High Seas (see map, overleaf), 
will still be fished by hook-and-line, which is 
considered biologically more sustainable. The 
islands will also cut the time that purse-seiners 
can spend fishing in their territorial waters by 


nearly a third. The restrictions, agreed to by 
the eight nations in April, are scheduled to take 
effect on 1 January 2011. 

Marine biologists say the development is a 
major step forward for efforts to halt the global 
decline of bigeye, yellowfin, skipjack and other 
tropical tuna species. In October this year, Brit- 
ain turned the entire Exclusive Economic Zone 
around the Chagos Islands, in the centre of the 
Indian Ocean, into a no-take zone, making it 
the first area rich in tuna that has been closed 
to fishing. At 3.2 million square kilometres, the 
Eastern High Seas is six times larger. 

“These are the most far-reaching ocean-con- 
servation measures ever,’ 
says Daniel Pauly, a lead- 
ing fisheries scientist at 
the University of British 
Columbia in Vancouver. 
“For the first time since 


For more on fisheries 
and the challenges 
they face, see: 


man has been fishing out the open oceans, 
we're going to see a reversal of the decline of 
pelagic species in two big areas.” 

At the meeting this week, the world’s major 
fishing nations, including the United States, are 
expected to challenge the measures. By treaty, 
the United States is technically exempt from 
the restrictions, but two years ago it chose to 
side with the island nations in the closing of 
two smaller pockets of international waters to 
foreign fleets. 

This time, conference sources predict a less 
sympathetic attitude. “We're not totally settled 
in our positions,’ says Charles Karnella of the 
National Oceanic and Atmospheric Adminis- 
tration, who heads the US delegation. “We're 
renegotiating our treaty and how the closure is 
dealt with will be part of our talks.” 

Ships from the United States operate under 
the South Pacific Tuna Treaty, in which the 
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Second only to the dwindling bluefin in 
economic importance and vulnerability, the 
bigeye tuna (Thunnus obesus) is the deep 
diver of the tuna family. It typically spends 
most of its day foraging for fish and squid at 
depths of several hundred metres. At night, it 
rises closer to the surface, following its prey. 
Adults can reach 2.5 metres long and weigh 
up to 180 kilograms. 

Prized for the texture, taste and colour of 


> US government pays most of their licence 
fees and provides US$18 million in foreign 
aid to 14 island nations. In exchange, the US 
fishing fleet, now set at 40 ships, has unlimited 
access to the region. 

Should the United States decide not to abide 
by the new closures, US purse-seiners could 
find themselves fishing there virtually alone, 
contributing to the depletion of bigeye tuna 
at a time when fisheries scientists are calling 
for a 30% reduction in bigeye catch to avoid 
the collapse of stocks, which have fallen from 
1.2 million tonnes to 500,000 tonnes since 1952 
(see ‘Bigeye at risk’). 


MARINE HAVENS 

Sari Tolvanen, a Greenpeace International 
oceans campaigner attending the meeting, 
points out that the closure area is bordered 
on both sides by no-take areas created when 
former US president George W. Bush named 
the islands of Wake, Johnston, Jarvis, Howland, 
Baker and Palmyra Atoll as a Marine National 
Monument in January 2009. 

“It would be shameful if the Obama adminis- 
tration did not follow the Bush administration's 
example and opted out of the conservation 
measures taken by the Pacific island nations,” 
she says. 


NEW REFUGES FOR TUNA 


The eight nations of the Parties to the Nauru Agreement aim to ban the 


TUNA FACTS 
Bigeye at risk 


its meat, the bigeye is replacing bluefin as 
the most expensive tuna for sushi. 

Adults are caught by hook-and-line 
vessels in numbers that researchers say are 
sustainable. But teenage bigeyes like to swim 
with schools of the same-sized, adult skipjack 
when these assemble in huge schools around 
fish-aggregating devices used by the purse- 
seine fleet. All end up in cans. 

Although juvenile bigeyes are not preferred 


The decision to end purse-seining in the 
Eastern High Seas was taken by the Parties 
to the Nauru Agreement (PNA), eight Pacific 
island nations in whose waters 80% of the 
region’s tuna is fished. They comprise the 
Federated States of Micronesia, Kiribati, the 
Marshall Islands, Nauru, Palau, Papua New 
Guinea, the Solomon Islands and Tuvalu. 

Although no nation can legally restrict fish- 
ing in the high seas, the PNA countries have 
jointly amended the standard contracts they 
sign with foreign fleets with a stipulation that 
the fleets refrain from fishing in some inter- 
national waters, in order to remain eligible for 
licences within waters directly controlled by 
the islands. 

The regulations are made enforceable by the 
use of radio transponders, which reveal the 
positions of the licensed ships at all times. The 
first such measures, which began in 2008, closed 
smaller pockets of international waters that were 
being used as refuges for vessels fishing illegally. 
The extension of the ban to the Eastern High 
Seas is seen as potentially far more important 
because this region is large enough to have an 
impact on species preservation. 

Officials from the PNA say that once they 
have succeeded in ending purse-seining in the 
Eastern High Seas, they will ban longliners 
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by canneries, purse-seiners use the devices 
to increase their take of skipjack. 

Environmentalists and Pacific islanders 
have called for the devices to be banned to 
spare the bigeyes. 


—a different type of fishing vessel that catches 
about the same tonnage as a purse-seiner — 
from the area, effectively turning it into the 
world’s largest marine reserve. 

An important factor that will determine the 
effectiveness of the ban is the extent to which 
fish stay within the protected area. John Hamp- 
ton, head of the fisheries programme at the 
Secretariat of the Pacific Community (SPC), 
co-authored a study conducted farther to the 
west. This found that half the skipjack tuna there 
spend their entire lives within a radius of 675- 
750 kilometres (J. Sibert & J. Hampton Mar. Pol. 
27, 87-95; 2003). A new SPC study under way 
in the Central Pacific focuses on bigeye tuna, 
with 15,000 tuna tagged so far, he says. It will 
provide information on tuna movement and 

mixing with adjacent 


“These are areas, rates of mortal- 
the most ity and other important 
far-reaching population parameters. 
ocean- “With good tagging 
conservation data, well have a better 
measures understanding of the 
ever.” way the tuna move, and 


that will help us predict 
the effects of the closures on the population 
levels,” he says. 

Pauly predicts that because the individual 
tuna that do not travel long distances will survive 
in greater numbers, their offspring will have a 
genetic advantage over those that do range more 
widely, enhancing the conservation value of the 
refuges. “It's going to make the islands and the 
seamounts much more important, more attrac- 
tive to these fishes,’ Pauly says. 

Because this is the first time that anyone has 
tried to end fishing in a large body of inter- 
national waters, it could establish a precedent, 
observers say. Until now, the world’s seas have 
been fair game to the international indus- 
trial fishing fleets, with the commissions that 
theoretically have the power to restrict them 
strongly influenced by the fleets themselves. 

“The closures would raise legal issues if they 
weren't justified as conservation measures,” 
says Satya Nandan, chairman of the WCPFC. 
“Our stocks are relatively healthy and we want 
to keep them that way.” m 
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Microbe gets toxic response 


Researchers question the science behind last week’s revelation of arsenic-based life. 


BY ALLA KATSNELSON 


ays after an announcement that a strain 
D of bacteria can apparently use arsenic in 

place of phosphorous to build its DNA 
and other biomolecules — an ability unknown 
in any other organism — some scientists are 
questioning the finding and taking issue with 
how it was communicated to non-specialists. 

Many readily agree that the bacterium, 
described last week in Science and dubbed 
GFAJ-1 (F. Wolfe-Simon et al. Science doi: 
10.1126/science.1197258; 2010), performs a 
remarkable feat by surviving high concentra- 
tions of arsenic in California's Mono 
Lake and in the laboratory. But data 
in the paper, they argue, suggest that 
it is just as likely that the microbe 
isn't using the arsenic, but instead is 
scavenging every possible phosphate 
molecule while fighting off arsenic 
toxicity. The claim at a NASA press 
briefing that the bacterium repre- 
sents a new chemistry of life is at best 
premature, they say. 

“Tt's a great story about adaptation, 
but it’s not ET’ says Gerald Joyce, a 
biochemist at the Scripps Research 
Institute in La Jolla, California. 

At the press briefing, Steven Benner, 
a chemist at the Foundation for 
Applied Molecular Evolution in Gainesville, 
Florida, who was invited to the event to offer 
outside comment, used the analogy of a steel 
chain with a tinfoil link to illustrate that the 
arsenate ion said to replace phosphate in the 
bacterium’s DNA forms bonds that are orders 
of magnitude less stable. Not only would the 
organism’s DNA have to stay together in spite of 
the weaker bonds, says Benner, but so would all 
the molecules required to draw arsenate from 
the environment and build it into the genetic 
material. Co-authors of the paper, including 
Paul Davies, an astrobiologist at Arizona State 
University in Tempe, have countered that the 


> 


MORE 
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arsenate bonds could be reinforced by special- 
ized molecules, or that arsenic-based life simply 
has a higher turnover for molecular disintegra- 
tion and assembly than does conventional life. 

The big problem, however, is that the authors 
have shown that the organism takes up arsenic, 
but they “haven't unambiguously identified any 
arsenic-containing organic compounds’, says 
Roger Summons, a biogeochemist at the Massa- 
chusetts Institute of Technology in Cambridge. 
“And it’s not difficult to do” he adds, noting that 
the team could have directly confirmed or dis- 
proved the presence of arsenic in the DNA or 
RNA using targeted mass spectrometry. 


The cells’ large vacuoles may indicate that they are sequestering arsenic. 


Some researchers suggest that the authors’ 
own data hint at an organism that is simply 
absorbing and isolating arsenate while mak- 
ing use of the trace phosphates in its environ- 
ment. For one thing, says Joyce, the paper 
shows that the organisms appear bloated, 
and contain large, vacuole-like structures — 
often a sign of sequestered toxic material. The 
arsenate-grown cells were analysed in their 
resting phase, which requires less phosphate 
for survival than does active growth, notes 
Joyce, and cells grown in high concentrations 
of arsenate did not seem to contain any RNA 
— possibly because RNA production had shut 


down to conserve phosphate. One calculation 
in the paper showed that the DNA in arsenate- 
grown cells actually contained 26 times more 
phosphorus than arsenic. 

“T fault the authors for not noticing these 
things and sorting them out,” says Rosemary 
Redfield, a microbiologist at the University of 
British Columbia in Vancouver, Canada, whose 
summary of the paper’s problems, posted on 
her blog on 4 December, has already had more 
than 30,000 hits. “We shouldn't have to do the 
thinking for them” 

Felisa Wolfe-Simon, a NASA astrobiology 
research fellow at the US Geological Survey 
in Menlo Park, California, and the 
study’s lead author, refused to address 
criticisms. “We are not going to engage 
in this sort of discussion,” she wrote 
in an e-mail to Nature. “Any discourse 
will have to be peer-reviewed in the 
same manner as our paper was, and 
go through a vetting process so that all 
discussion is properly moderated.” 

But Jonathan Eisen, a microbiol- 
ogist at the University of California, 
Davis, calls this “ludicrous”, after 
a NASA press release drew media 
attention with claims of an “astrobiol- 
ogy finding that will impact the search 
for evidence of extraterrestrial life’, a 
theme that Wolfe-Simon echoed at the 
briefing. “It is absurd for them to say that they 
are only going have the discussion in the scien- 
tific literature, when they started it,’ he says. 

Ginger Pinholster, a spokeswoman for 
Science’s publisher, the American Association 
for the Advancement of Science in Washington 
DC, noted that the journal regards significant 
responses to high-visibility articles, as well as 
efforts to replicate the work, as a “key goal of 
publication” Pinholster also pointed out that 
the journal’s own press summary of the paper 
made no mention of the search for extrater- 
restrial life, nor did Science “organize any addi- 
tional promotional events”. m 
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Basel Declaration defends 
animal research 


Dialogue with public is key to reducing opposition over the use of lab animals. 


BY ALISON ABBOTT 


nimal activists last summer set fire 

to the alpine holiday home of Daniel 

Vasella, then chief executive of phar- 
maceutical giant Novartis of Basel, Switzer- 
land, in one of relatively few violent attacks on 
scientists working with animals in German- 
speaking countries. 

But in the past few years these scientists have 
been feeling the pressure in other ways — from 
animal activists who have attempted to publicly 
shame them or have sent threatening e-mails, 
and from legislation that increasingly restricts 
the use of animals in basic research. 

Now, in a bid to reverse that trend, more 
than 50 top scientists working in Germany 
and Switzerland have launched an education 
offensive. Meeting in Basel on 29 November, 
they drafted and signed a declaration pledging 
to be more open about their research, and to 
engage in more public dialogue. 

“The public tends to have false perceptions 
about animal research, such as thinking they 
can always be replaced by alternative methods 
like cell culture,” says Stefan Treue, director 
of the German Primate Center in Gottingen. 
Treue co-chaired the Basel meeting, called 
“Research at a Crossroads, with molecular 
biologist Michael Hengartner, dean of science 
at the University of Zurich, Switzerland. Out- 
reach activities, such as inviting the public into 
universities to talk to scientists about animal 
research, “will be helpful to both sides”. 

The Basel Declaration reiterates the legal 


LAB ANIMALS USED IN GERMANY 


and ethical requirements of the signatories to 
reduce the use of animals as far as possible (see 
‘Lab animals used in Germany’), and to keep 
their suffering to a minimum. Butit also force- 
fully disputes recent efforts to declare animal 
use in basic research less acceptable than ani- 
mal use in experiments that may yield practical 
benefits. In recent legal cases in the two coun- 
tries, courts have interpreted national laws as 
forbidding basic research on primates. 

In Bremen, Germany, the local government 
decided in 2007 not to renew the licence of 
neuroscientist Andreas Kreiter to work on 
macaques because the work was “too far from 
applications”’. This put a stop to his research 
recording from the animals’ brains as they per- 
formed simple tasks. The ban still holds. 

This echoed a case in Switzerland, where 
Kevan Martin, a director of the Institute of 
Neuroinformatics in Zurich, had to halt his 
research programme to map the functional 
microcircuitry of the brain of macaques in 
2006 after Zurich's authorities declined to renew 
his licence for primate work’. The authorities, 
which also banned other local projects involv- 
ing primates, said that Martin's work offended 
the dignity of the animals — which has been 
protected in the Swiss constitution since 2004 — 
and would not reap practical benefits for society 
in the near term. 

Martin appealed the decision in Switzerland's 
supreme court, but in September last year the 
court upheld the decision. 

“That was a shock for the community, and 
was one of the main motivations for holding 


The total declined in the 1990s, but has since flattened with growing use of transgenic mice. 
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our meeting,’ says Hengartner. “For the first 
time in Switzerland, the law was making a dis- 
tinction between basic and applied research, 
and arguing that basic research with non- 
human primates is less valuable than applied 
research,’ he says. “But in biomedicine they are 
one and the same; applied research stands on 
the shoulders of basic research” 

Scientists in Germany — which, unlike Swit- 
zerland, is a member of the European Union 
(EU) — had another reason to sign the declara- 
tion. In September, the EU approved a directive 
on animal use in research that must be trans- 
lated into national law in the 27 EU member 
states within the next two years. “There are 
a lot of broad terms in the directive — like 

‘severe pain’ — that could 


“The animal allow countries like Ger- 
issueisnever many to choose wording 
going to go which makes the legisla- 
away. a tion more restrictive than 


intended,” says Treue. 
In addition, some of the directive’s rules are 
unscientific, he says. For example, it includes 
special rules for cats and dogs. 

Treue is particularly concerned that the 
directive bans some experiments outright — 
such as those involving severe pain, or using 
great apes — rather than allowing ethical com- 
mittees to regulate them. However, early drafts 
of the EU directive were much worse, says Treue 
— one of many scientists who talked to EU par- 
liamentarians to help to ensure that a ban on 
basic research using primates was removed’. 

In the United Kingdom, similar initiatives 
coupled to animal terrorism laws helped to cut 
extremism, says Mark Matfield, director of the 
London-based European Biomedical Research 
Association, which represents animal research- 
ers across Europe. “Being open and discussing 
with the public why you sometimes need to 
use animals is a reliable and tested idea which 
improved the climate for research in Britain.” 

The declaration will now be sent to deans of 
medicine and other research leaders in Ger- 
many and Switzerland to garner support. The 
organizers hope to go on to promote it interna- 
tionally. “The animal issue is never going to go 
away, says Treue. “We need solidarity among 
all researchers.” m SEE EDITORIAL, P.731 
1. Schiermeier, Q. Nature 455, 1159 (2008). 
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Luc Montagnier, co-discoverer of HIV, is no stranger to controversy. 


Trial draws fire 


Nobel laureate to test link between autism and infection. 


BY DECLAN BUTLER 


ideas to the treatment of autism. With 
support from the Autism Research Insti- 
tute (ARI), based in San Diego, California, the 
Nobel laureate is about to launch a small clini- 
cal trial of prolonged antibiotic treatment in 
children with autism disorders. The trial will 
also use techniques based on Montagnier’s 
research into the notion that water can retain a 
‘memory of long-vanished pathogens, and that 
DNA sequences produce water nanostructures 
that emit electromagnetic waves, published last 
year. But experts are critical and worry that the 
nobelist’s status may lend unwarranted credibil- 
ity to unconventional approaches to autism. 
The planned pilot trial in France — funded 
by a US$40,000 grant from ARI — will screen 
around 30 children with autism disorders and 
20 or so controls for bacterial infections, and 
then test whether months of antibiotic treat- 
ment improve the children’s condition. Mon- 
tagnier, who shared the 2008 Nobel prize for 
physiology or medicine for the discovery of 
HIV, concedes that there is no solid scientific 
evidence that infection causes or contributes 
to autism, but he argues that many parents and 
physicians have observed “spectacular” benefits 
from prolonged treatment. Stephen Edelson, 
director of ARI, says he’s “very excited” about 
the “cutting-edge, groundbreaking” study. 
Catherine Lord, a clinical psychologist work- 
ing on autism at the University of Michigan in 


L: Montagnier is applying unorthodox 


Ann Arbor, says that the trials are “not main- 
stream science”. Lord says that many of the 
widely practised alternative medicine treat- 
ments for autism — including dietary modifi- 
cation, nutritional supplements and chelation 
therapy — are “semi-medical, not evidence- 
based science, and more pseudoscience.” 

Edelson, however, says that there are so many 
forms of autism and so much that is not known 
that “we need to study every angle”. Criticisms 
of the science base of alternative approaches 
“probably would have been true were it ten 
years ago’, he says, but critics don’t appreciate 
how much research has been done since. 

“Tm just interested in helping these chil- 
dren,’ Montagnier says. He acknowledges that 
many mainstream scientists are sceptical of his 
work, but defends his ideas. “In 1983, we were 
only a dozen or so people to believe that the 
virus we had isolated was the cause of AIDS.” 

Since then, Montagnier has supported non- 
mainstream theories in AIDS research that 
have put him at odds with other scientists. 
Most recently, he has argued that strengthen- 
ing the immune system with antioxidants and 
nutritional supplements needs to be consid- 
ered along with antiretroviral drugs in fighting 
AIDS, in particular in Africa. 

“Montagnier’s embrace 


> NATURE.COM of pseudoscientific and 
For more on Nobel fringe agendas over the 
laureates’ viewson —_ past few years has been 
science see: seized on by AIDS deni- 
go.nature.com/fmmdne © alists and other fringe 


IN FOCUS | NEWS 


groups, who make the case that Montagnier 
now supports their crazy views,’ says John 
Moore, an AIDS virologist at Cornell Univer- 
sity in New York. Montagnier says that AIDS 
denialist groups misrepresent his thinking. 

The autism trial enters a new area of contro- 
versy. The Infectious Disease Society of America 
have reviewed long-duration antibiotic treat- 
ments in Lyme disease, and concluded in April 
that the “inherent risks of long-term antibiotic 
therapy were not justified by clinical benefit”. 
Montagnier acknowledges that safety concerns 
exist, but argues that opposition to long anti- 
biotic treatments can also be “dogma”. The trials 
will need to be cleared with the relevant ethics 
and regulatory bodies, he notes, and will include 
careful precautions and surveillance. “Expert 
physicians have learned to avoid side effects and 
to choose the right regimen,” he says. 

Another element of the trial is also attract- 
ing scepticism. Besides screening the children 
for pathogens with conventional DNA-ampli- 
fication techniques, the researchers will use a 
diagnostic test based on the controversial idea 
championed by the late French scientist Jacques 
Benveniste, who claimed that water can retain 
the memory of substances it contained even 
after they have been diluted away. Studies have 
failed to confirm the claim, but Montagnier 
thinks that the ‘memory’ structures in the water 
can resonate with low-frequency electromag- 
netic signals, which he hopes can be transmit- 
ted over the Internet. He claims that very dilute 
solutions of pathogen DNA also emit such 
signals, and he intends to use this as a sensitive 
‘biomarker’ for chronic infection. 

Montagnier has published two papers on his 
research into the memory of water, one on bac- 
terial DNA (ref. 1) and another claiming to have 
found electromagnetic signals of HIV DNA (ref. 
2) in patients treated with antiretroviral drugs 


and whose blood seemed 
“Pmjust virus-free. He speculates 
interestedin _ that this may be the HIV 
helping these __ reservoir from which the 


virus rebounds when 
antiretroviral treatment 
is paused or stopped. Several AIDS researchers 
contacted by Nature dismissed Montagnier’s 
claims but declined to comment publicly. 
Montagnier, who says that he is planning 
independent replication of his findings, pub- 
lished both papers in a new journal from Ber- 
lin-based Springer — Interdisciplinary Sciences: 
Computational Life Sciences, the editorial board 
of which he chairs. Asked why he didn't try to 
publish his astonishing findings in a higher- 
profile journal, Montagnier explained that he 
was sure that if he had sent them to Nature or 
Science, he would have run foul of experts who, 
on seeing mention of Benveniste or ‘memory- 
of-water, would “reach for their revolvers” = 


children.” 


1. Montagnier, L. et al. Interdisciplin. Sci.: Comput. Life 
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Restoration of Australia’s drying Murray-Darling basin has caused a rift between farmers and scientists. 


River chief resigns 


Plan to save Australian river system runs aground. 


BY STEPHEN PINCOCK IN SYDNEY 


he challenge of balancing human needs 
Te water with those of the environment 

has come to a head in Australia’s ailing 
Murray-Darling river system, where an 
ambitious plan to restore the ecosystem was 
thrown into turmoil on 7 December after the 
leader of the scheme resigned. 

Michael Taylor, chairman of the government- 
appointed Murray-Darling Basin Authority, 
announced he was stepping down just 2 months 
after releasing a draft plan for water manage- 
ment that prompted protests from the region's 
farmers, but which was largely backed by many 
of Australia's environmental scientists. 

Covering 1 million square kilometres in 
southeastern Australia, the Murray-Darling 
basin is a vital agricultural region, containing 
rich wetland habitats and around 50 endan- 
gered species of birds and mammals. But 
decades of poor water management and 
drought have left ecosystems in crisis and many 
farmers short of water. 

Research by the Commonwealth Scien- 
tific and Industrial Research Organisation in 
2007-08 revealed that water use in the basin 
had reduced average annual streamflow at the 
mouth of the Murray River by 61%, which now 
fails to reach the sea 40% of the time. This has 
brought ecological problems such as increased 
salinity, algal blooms and the collapse of native 
fish and waterbird populations. 

Successive state and national governments in 
Australia have struggled to manage the region's 
water resources. In October, the basin authority 
released a blueprint to address the problem by 


reducing the amount of water diverted for irriga- 
tion from the basin’s 77,000 kilometres of rivers. 
The main recommendation — that 3,000-4,000 
billion litres of water should be released from 
agricultural use each year and returned to the 
environment — triggered months of heated 
debate about the effect on farm communities. 

The government asked Taylor to ensure the 
plan balanced ecological outcomes with the 
impact on jobs or other socioeconomic fac- 
tors. But in his resignation statement, Taylor 
said he had received legal advice that the plan 
“cannot compromise the minimum level of water 
required to restore the system's environment 
on social or economic grounds”. He urged the 
government to reconsider the plan’s next phase. 

Richard Kingsford, director of the Australian 
Wetlands and Rivers Centre at the University of 
New South Wales in Sydney, said that the res- 
ignation was “symptomatic of what a difficult 
process this is. It’s true that a basin plan can’t 
do everything” 

Kingsford and 57 other environmental sci- 
entists recently released a statement saying that 
debate over the plan was too focused on short- 
term economic pain rather than the long-term 
economic benefits ofa healthy river system. He 
and his colleagues argue that the next version 
of the basin plan, due early next year, should 
use 3,000-4,000 billion litres as the minimum 
release volume. “Michael Taylor's resignation 
might provide for a new form of leadership to 
come in and take the issue on,” Kingsford says. 

The final plan for the Murray-Darling basin 
is due to be tabled in the Australian parliament 
at the end of 2011 following consultation with 
the public, state governments and ministers. m 
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Self-plagiarism case prompts 
calls for agencies to tighten rules 


Technology is bringing down instances of duplication, despite variability in oversight. 


BY EUGENIE SAMUEL REICH 


is one’s own? Self-plagiarism may seem a 
smaller infraction than stealing another 
author’s work, but the practice is under 
increasing scrutiny, as the eruption two weeks 
ago of along-standing controversy at Queen's 
University in Kingston, Canada, makes clear. 
Colleagues of Reginald Smith, an emeritus 
professor of mechanical and materials engin- 
eering at Queen’s, say that up to 20 of Smith’s 
papers contain material copied without 
acknowledgment from previous publications. 
University officials first learned of the dupli- 
cations in 2005, and they eventually led to 
an investigation by the Natural Sciences and 
Engineering Research Council (NSERC), 
which funded some of Smith’s work, including 
experiments on board the US space shuttles. 
Although Smith avoided censure for research 
misconduct, three papers were subsequently 
retracted by the Annals of the New York Acad- 
emy of Sciences‘ and one by the Journal of 
Materials Processing Technology’. The situation 
was recently made public in news reports and 
has led to calls for stronger powers by funding 
agencies in Canada to discipline researchers 
who engage in the practice. 

“He was a very good scientist, but some- 
thing happened and he got into this business 
of duplicating papers,” says Chris Pickles, a 
metallurgist at Queen's who raised concerns 
about Smith’s publication practices after spot- 
ting some duplications under Smith’s name 
while searching an online database. Smith 
referred a request for comment to his lawyer, 
Ken Clark of law firm Aird and Berlis in 
Toronto, Canada, who notes that many of the 
republications duplicated material from con- 
ference proceedings, which in an earlier epoch 
would not usually have been published. He also 
notes that Smith is retired, and does not stand 
to gain financially from his republications. 

Many researchers say that republication 
without citation violates the premise that each 
scientific paper should be an original con- 
tribution. It can also serve to falsely inflate a 
researcher's CV by suggesting a higher level 
of productivity. And 
although the repetition 
of the methods section of 
a paper is not necessarily 
considered inappropriate 


I s plagiarism a sin ifthe duplicated material 


> NATURE.COM 
Journals step up 
plagiarism policing: 
go.nature.com/kdmlsa 


by the scientific community, “we would 
expect that results, discussion and the abstract 
present novel results”, says Harold Garner, a 
bioinformatician at Virginia Polytechnic 
Institute and State University in Blacksburg. 
Garner’s research group used an automated 
software tool to check the biomedical litera- 
ture for duplicated text, and identified more 
than 79,000 pairs of article abstracts and titles 
containing duplicated wording. He says work 
on the database of partly duplicated articles — 
called Déja vu (go.nature.com/hgq2t4) — has 


DROP IN DUPLICITY? 


There has been a decline in the number 
of new highly similar pairs of manuscripts. 
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led to close to 100 retractions by journal editors 
who found the reuse improper. An analysis by 
Garner in the press at Urologic Oncology’ shows 
that while the total quantity of biomedical lit- 
erature has risen steadily since 2000, cases of 
republication stopped rising after 2003 and fell 
sharply between 2006 and 2008 (see graph). “It 
actually does look like it’s getting better,” says 
Garner. “People who would ordinarily step 
across the line are not doing it.” 

He credits increased vigilance by journal 


editors who are using his free tool or commer- 
cially available software to check submissions 
for repeated text and halt dubious papers before 
they reach publication. 

NSERC’s policy on integrity in research 
makes no specific reference to plagiarism 
or self-plagiarism, which has led to calls for 
tougher rules in the wake of the publicity 
over Smith's case. In the United States, the 
National Science Foundation (NSF) takes a 
strong stance on plagiarism in general, says 
Christine Boesz, who was inspector-general 
at the NSF from 1999 until 2008. “The NSF 
got into the plagiarism game early,’ she says. 
Numbers obtained by Nature under the US 
Freedom of Information Act show that, since 
2007, the agency has found between 5 and 13 
cases of plagiarism each year. In contrast, the 
US Department of Health and Human Serv- 
ices’s Office of Research Integrity (ORI), which 
is responsible for overseeing alleged plagiarism 
associated with National Institutes of Health 
research, has reported no cases of plagiarism 
of text over the past three years, but has found 
up to 14 scientists a year guilty of falsification 
or fabrication of data (see table). 

Ann Bradley, a spokeswoman for the ORI, 
says the office's working definition of plagiarism 
(go.nature.com/p15kcu) excludes minor cases. 
Nick Steneck, director of research ethics and 
integrity at the University of Michigan in Ann 
Arbor, says authorities worldwide should adopt 
a uniform misconduct policy that provides 
clear guidance not only on data falsification 
and fabrication but also on lesser ethical 
breaches — such as self-plagiarism m 


1. Braaten, D. Ann. NY Acad. Sci. 1176, 228 (2009). 
2. Smith, R. W., DeMonte, A. & Mackay, W. B. F. 
J. Mater. Process. Tech. 153-154, 589-595 (2004). 
3. Garner, H.R. Urol. Oncol.-Semin. Ori. doi:10.1016/j. 
urolonc.2010.09.016 (in the press). 


CASES OF MISCONDUCT AND PLAGIARISM AS REPORTED BY US RESEARCH AGENCIES 


Office of Research Integrity (ORI) National Science Foundation (NSF) 
2008 2009 2010 2008 2009 2010 

Debarments for 2 1 3 il Z 3 
falsification/fabrication 
Debarments for plagiarism 0 0 0 4 
Findings of falsification/ 7 14 1 il 2 
fabrication 
Findings of plagiarism 0 0 0 5 13 10 
Number of funded researchers: National Institutes of Health (ORI) 325,000; NSF 98,820 (2010). 2010 data run until August. 
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MOLECULAR BIOLOGY 


Stem-cell progress 


Replacing genes with drugs could allow safe reprogramming. 


BY EWEN CALLAWAY 


ver since scientists first switched adult 
Hiss cells into an embryonic-like state 

from which they can develop into any 
tissue type, recipes for making these induced 
pluripotent stem (iPS) cells have multiplied. 

However, many rely on the introduction 
of foreign genes by viruses, which makes the 
altered cells unsuitable for use in patients. Now 
researchers have replaced all but one of the 
genes with a cocktail of chemicals, taking sci- 
entists a step closer to creating patient-specific 
iPS cells that could be used in the clinic. 

The advance, by chemist Sheng Ding at the 
Scripps Research Institute in San Diego, Cali- 
fornia, and his colleagues (Zhu, S. et al. Cell 
Stem Cell7, 651-655; 2010) is an adaptation of 
an approach developed by Shinya Yamanaka’s 
lab at Kyoto University in Japan. Yamanaka’s 
group — one of two to first create iPS cells — 
infected adult cells with viruses carrying the 
genes OCT3/4, SOX2, KLF4 and c-MYC. 


However, these iPS cells have foreign DNA 
peppered throughout their genomes, where 
it might interrupt genes that protect against 
tumours. Over the past few years, scientists 
such as Ding have developed safer ways to 
make pluripotent cells, by delivering repro- 
gramming factors in other ways (see ‘Virus- 
free iPS cells’). 

In 2008, Ding’s team showed that a mixture 
of chemicals and two genes could reprogram 


VIRUS-FREE IPS CELLS 


Reprogramming-factor 
delivery method 


Pros 


Proteins No foreign genes involved 

Transposons Just one piece of DNA is 
inserted into the genome 

RNA Quicker and more efficient 


than other methods 


Small molecules No foreign genes, 


potentially cheap to make 


neural progenitor cells, which already express 
other genes needed to make iPS cells (Shi, Y. 
et al. Cell Stem Cell 2, 525-528; 2008). Now 
his group has made human iPS cells from skin 
cells by treating them with drugs and just one 
virus-delivered gene, OCT4. The resulting cells 
express the same genes as embryonic stem cells 
and can transform into different types of cell. 

OCT4 can be replaced as well, so an iPS pro- 
tocol entirely free of foreign genes shouldn't 
be far off. Ding says that his team has already 
created mouse iPS cells using only drugs, and 
is making progress with human cells. 

Robert Lanza, chief scientific officer of 
Advanced Cell Technology in Marlborough, 
Massachusetts, says that iPS cells should soon be 
safe enough to test in humans. “I think we now 
have the tools to contemplate clinical trials” = 


Cons 


Protein modifications could influence 
reprogramming, expensive to manufacture 


The transposon must be removed using a 
specialized enzyme 


Potential for immune response, expensive to 
manufacture 


Drugs can affect unintended protein targets 
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UPERSOLID'S WEME 


John Reppy has come out of retirement to question the high-profile 
discovery of anew kind of quantum matter. 


BY EUGENIE SAMUEL REICH 


S. OGDEN 


seems to hear. He picks up a pair of scissors, and starts snipping 

away at the plastic strands wound round the shiny beryllium- 
copper components of his torsional-oscillator experiment. “I want to 
make a change to it anyway,’ he says. As he snips, pieces of wire and pip- 
ing begin to pop out of the neat cylindrical column he has built, making 
it completely clear what the floss is for: to hold everything down. 

The pieces of this experiment, in a basement lab at Cornell University’s 
Clark Hall in Ithaca, New York, span more than half of Reppy’s 50-year 
career studying the behaviour of helium cooled to ultra-low temperatures. 
Near the top of the metre-long column is a 30-year-old refrigeration unit 
that Reppy found among the bric-a-brac in his lab a few years after he 
signed up for retirement. Below that is a torsional oscillator of the type he 
invented in the 1970s — a cylindrical vessel, just a few centimetres across, 
that is free to twist back and forth around a rod running down the centre 
of the cylinder. When the vessel is filled with the isotope helium-4 via 
pipes wrapped around the column, and when its temperature is gradually John Reppy holds up a 
lowered, changes in the oscillation frequency reveal changes in the physi- torsional oscillator used 
cal properties of the helium. At two-tenths of a degree above absolute in his experiments. 


T he fourth time he is asked what the dental floss is for, John Reppy 
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zero, for example, the helium-4 condenses into a solid crystal — and may 
even turn into a ‘supersolid’ a strange quantum state in which some of 
the atoms seem to pass through others without friction. 

Or it may not. In recent months, the results from his apparatus have 
led the 79-year-old, semi-retired Reppy to become a vocal critic of a 
2004 claim by physicists Moses Chan and Eun-Seong Kim that they had 
formed a supersolid in Chan's laboratory at Pennsylvania State Univer- 
sity in University Park. The stakes are high: other such macroscopic- 
scale quantum effects, such as superconductivity and superfluidity, have 
won their discoverers Nobel prizes. And Reppy knows his criticisms 
are raising hackles in the field. Others have replicated Chan and Kim's 
results, yet no one has replicated Reppy’s contradictory finding. “My 
result is a bolt out of the blue,’ he admits. 

Nonetheless, as the inventor of the modern torsional-oscillator appa- 
ratus, and as Chan's former supervisor, Reppy has a professional stature 
that makes his views impossible to ignore. “He's come up with a lot of 
inventive, clever experimental techniques, and always manages to pick 
out the experiment that reveals what’s really going on,’ says David Lee 
at Texas A&M University in College Station, one of the winners of the 
1996 physics Nobel prize for the discovery of superfluidity in another 
isotope of helium, helium-3, work done while he was at Cornell. 


HEART OF THE MATTER 

The roots of the supersolid controversy go back to 1969, when Russian 
physicists predicted a state of solid matter in which gaps, or vacancies, 
ina crystal structure could move together as a single quantum wave — a 
collective motion reminiscent of the frictionless flow of a superfluid. 

In 2004, Chan and Kim reported the first experimental evidence 
consistent with this ‘supersolid’ behaviour’”. They found that the back- 
and-forth swings ofa torsional oscillator filled with solid helium-4 sped 
up as the temperature was lowered to below two-tenths of a degree above 
absolute zero — just as would be expected if a supersolid were forming 
inside. The idea is that the zero-friction quantum effects predicted by the 
Russians effectively decouple some of the atoms in the solid and prevent 
them from oscillating along with the rest of the atoms. This makes the 
inertia of the oscillator smaller than the total quantity of helium would 
suggest, which leads to the faster oscillations. Chan and Kim's claim 
prompted enormous excitement, and about a dozen researchers began 
building torsional oscillators in a bid to replicate the observation. 

Reppy was one of them. When Chan and Kim first reported their 
results, Reppy was approaching the end ofa five-year Cornell programme 
intended to ease older faculty members into retirement by steadily reduc- 
ing their hours, teaching responsibilities and lab space. He was looking 
forward to a retirement spent rock-climbing — a field in which his repu- 
tation looms as large as it does in physics. A world-class climber since his 
student days, Reppy is famous for his invention and promotion of clean- 
climbing techniques, in which the nuts that hold the rope are wedged into 
existing, natural cracks in rock faces rather than banged in like pitons. He 
says he likes the technique not just because it is environmentally friendly, 
but because it is easy. And safe: when he talks about climbing, he doesn't 
emphasize the obvious excitement it gives him, so much as his caution. 
“You always climb with a partner,’ he says. 

But with supersolidity promising a different kind of adventure, Reppy 
decided to make a comeback from retirement. When he heard that a fel- 
low faculty member at Cornell had taken on a graduate student, Sophie 
Rittner, to replicate Chan and Kim's experiment, Reppy suggested that 
she work with him instead: just as in climbing, he needed a partner. Rit- 
tner did not immediately jump at the idea of signing on with a retired 
professor with a lab full ofjunk and no active research group. But it was 
obvious that Reppy would be a good supervisor for her work, as he had 
hada long career developing experimental tricks for studying superfluid- 
ity in helium-3. “I came to appreciate the fact that I had an adviser with a 
huge amount of time. He was super hands-on,’ says Rittner. Together they 
ordered the parts for a torsional oscillator, which Rittner constructed. 
Then every morning, Reppy came into the lab at about 7 a.m. and started 
the experiment going. He would stay until about 4 p.m.; Rittner came in 


FEATURE | NEWS 


later in the morning and stayed into the evening. 

The pair soon saw the increased oscillation that Chan and Kim had 
reported. But in February 2006, they tried something new and got a 
surprise. After one run of recording data, Reppy and Rittner let the 
frozen helium warm up to just above 1 kelvin, and then lowered the 
temperature again to repeat the run. The second time around, the speed- 
up was markedly diminished. 

Heating and then recooling a crystal, a process called annealing, is in 
general expected to remove defects in the crystal structure. To Reppy 
and Rittner, the implications of their observations were clear: the super- 
solid signal was not due to an intrinsic quantum behaviour of a pure 
crystal, but was somehow caused by disorder in the structure, which is 
why it went away when the defects did. 


IN A SPIN 

When Rittner presented the results’ at the March 2006 American Physi- 
cal Society meeting, there was something of an uproar, she says. “People 
were saying ‘what is this?’” The findings threatened to make supersolid- 
ity substantially less interesting, because effects caused by imperfections 
and impurities often turn out to be impossible for theoretical physicists 
to calculate exactly. Even now, six years after Chan and Kim’s experiment 
was published, there is still no comprehensive theory of supersolidity. 
And when Rittner gained her PhD, she decided to move on to a differ- 
ent research field. 

Left on his own, Reppy — an inveterate tinkerer — was soon try- 
ing to improve his apparatus. Picking up a box containing many of his 
historical torsional oscillators, he gives it a rattle, selects one and points 
out an interesting ridge. He loves to shape the metal pieces himself, he 
says. And with no more administrative duties to distract him, he adds 
happily: “I can spend all my time in the machine shop” 

One of Reppy’ first moves after Rittner left was to make a new oscilla- 
tor vessel, which, instead of holding the helium in a ring-shaped channel 


Craftsman at work: Reppy loves to fashion the parts he needs for his research. 
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circling the cylinder’s rim, also included 
a channel across the middle. He filled the 
oscillator with helium-4, ran the apparatus 
and verified that he saw the faster oscilla- 
tions attributed to a supersolid. Then he 
blocked one side of the ring, so that the 
putative supersolid could flow only through 
the central channel and the other side of the 
ring, and found that the signal decreased — 
just as the supersolid theory would predict. 
But then he also blocked the central chan- 
nel to try to stop the supersolid flowing at 
all, which should have made the signal go 
away entirely. But it didn’t. Thinking that 
there must be a leak, Reppy tried several 
variations — including just watching to see 
if the helium-4 escaped like air out of a bal- 
loon. It didnt. 

Reppy has never understood this obser- 
vation, and hasn't published it. But the 
unexpected behaviour planted a seed of 
doubt in his mind: was the formation of 
a supersolid the true explanation for the 
effects that everyone had seen? There were 
other discordant findings too. For example, 
liquid helium-4 ought to be able to flow 
through a solid helium-4 barrier if that 
solid contains some supersolid. But nei- 
ther Reppy and Rittner, nora group led by 
John Beamish at the University of Alberta 


SUPERSOLID OR NOT? 


Experts disagree on whether helium enters a rare 
quantum state at very low temperatures 
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Reppy’s interpretation 


Above 0.25 K, too high for a 
quantum effect, Reppy 
claims to see elasticity that 
slows the oscillator down 


vessel to be especially sensitive to changes 
in the elastic properties of the material — 
perhaps because the stiffer helium-4 at 
lower temperatures was effectively gluing 
together parts of his experimental cham- 
ber, and producing a heavier oscillator. 
Asked about that possibility, Reppy 
whisks the visitor out of his lab to a black- 
board ina breezy corridor, where he chalks 
out a calculation showing why he feels the 
gluing hypothesis is extremely unlikely. 
Even so, Reppy is having difficulty get- 
ting others in the field to share his doubts 
about the supersolid. Kim and his col- 
leagues have just published additional 
results’ showing what they say is conclu- 
sive evidence for formation ofa supersolid. 
Reppy has seen Kim's work, and says that 
he feels the problem presented by his own 
results hasn't yet been addressed properly. 
But the field of supersolid helium-4 is too 
small and collaborative for Reppy’s result 
to be ignored. Chan — for one — naturally 
bridles at the suggestion that the super- 
solid interpretation is in trouble, but he 
has also asked Reppy to collaborate, to gain 
a better understanding of his equipment. 


in Edmonton, were able to observe this. 
By late 2009, Reppy had tinkered with his 
apparatus yet again, adding a diaphragm on 
top that allowed him to deform a sample 
during a measurement run. Following up 
on the possibility that disorder was involved 
in supersolidity, he wanted to see if he could increase the amount of super- 
solidity in a given sample by using the deformation to introduce more 
defects. The results of this experiment were totally unexpected: Reppy 
found no evidence of a supersolid signal at all — at least, not at ultra-low 
temperatures. Instead, the deformation produced a decrease in the oscilla- 
tion frequency at higher temperatures — so high that the jiggling of atoms 
would be expected to destroy any quantum effect such as supersolidity. 


SOLID GROUND? 

The publication of these results in June 2010 caused another stir, includ- 
ing a news article in Science claiming that the evidence for a supersolid 
was “slipping away”’. Ina commentary in Physics°, Beamish suggested 
that Reppy had discovered a kind of “quantum plasticity”, an effect in 
which solid helium-4 radically increases its softness as its temperature is 
raised, then stiffens again as the temperature is lowered. That stiffening 
would cause the frequency of a torsional oscillator to increase and mimic 
the supersolid signal. Reppy has embraced that idea and interpreted his 
results as a repudiation of supersolidity — an indication that he, Chan, 
Kim and everyone else had in fact been seeing the disappearance of 
the previously undetected quantum plasticity at low temperatures. He 
insists that he takes no joy in that conclusion. “Im disappointed that this 
is turning out to be something other than a supersolid,’ he says. 

But others in the community are not so convinced. Is the evidence 
for supersolidity really slipping away, or does Reppy just have anoma- 
lous equipment? Sebastien Balibar, an expert on helium-4 at the Ecole 

Normale Supérieure in Paris, says he believes two 


> NATURE.COM novel effects have been discovered — supersolid- 
For more on ity and a radical change in elasticity, something 
supersolids see: akin to Beamish’s quantum plasticity. Without 
go.nature.com/hbwgp7 © meaning to, Balibar says, Reppy configured his 
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Below The elasticity Chan points out that even when Reppy and 
025K a Bee Rittner replicated the 2004 experiment, 
the oscillator they were reporting supersolid fractions 
OS eee ale of 20% — 20 times greater than the 1-2% 


No measured by other groups. He takes that as 
supersolid evidence that there's a secondary effect at 
wees work in Reppy’s apparatus that is swamp- 
ing the supersolid signal. Chan hopes that 
could be understood by testing the vessel’s 
response when filled with better-studied superfluid helium-3. 

Meanwhile, Reppy’s latest, unpublished, results are giving him new 
cause for doubt. These data were taken with a secondary oscillator added 
to the bottom of his experiment, which allows him to vary the frequency 
of the vessel's oscillation. His preliminary finding is that the response 
of the helium-4 sample depends on that frequency — which would not 
be the case if the helium-4 was a supersolid. But, Reppy wonders, doo- 
dling with chalk ona cartoon sketch of his vessel, could this be a way to 
turn the critiques of his experiment into a bonus? He starts drawing an 
alternative configuration of the apparatus, in which he could produce 
the first measurement of the elasticity of helium-4. Asked what light 
that would shed on the formation — or otherwise — ofa supersolid, he 
shrugs. “I don’t know,’ he says. 

Chan says that in a similar situation, with an experiment giving very 
surprising results, he probably wouldn't have published anything. But 
researchers in this field are having to feel their way experimentally 
because of the absence ofa guiding theory. And, as tends to happen with 
a quintessential experimentalist, Reppy’s caution inevitably gives way to 
dogged determination once he is confident that each result is real. “That 
20% — he knows it’s unusual, but he felt compelled to publish it” Chan 
says. “Whatever way it turns out, I think respect for him will grow” = 


Eugenie Samuel Reich is a reporter for Nature. 
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A taste of things to come? 


Researchers are sure that they can put lab- grown meat on the 


menu — if they can just get cultured muscle cells to bulk up. 


ark Post has never been tempted to 
Me the ‘fake’ pork that he grows in 

his lab. As far as he knows, the only 
person who has swallowed a strip of the pale, 
limp muscle tissue is a Russian TV journalist 
who visited the lab this year to film its work. 
“He just took it with tweezers out of the culture 
dish and stuffed it in his mouth before I could 
say anything,” says Post. “He said it was chewy 
and tasteless.” 

Post, who works at the Eindhoven Univer- 
sity of Technology in the Netherlands, is at the 
leading edge of efforts to make in vitro meat 
by growing animal muscle cells in a dish. His 
ultimate goal is to help rid the world of the 
wasteful production of farm animals for food 
by helping to develop life-like steaks. In the 
near term, he hopes to make a single palatable 
sausage of ground pork, showcased next to the 
living pig that donated its starter cells — if he 
can secure funds for his research. 

Post started out as a tissue engineer interested 
in turning stem cells into human muscle for 
use in reconstructive surgery, but switched to 
meata few years ago. “I realized this could have 
much greater impact than any of the medical 
work I'd been doing over 20 years — in terms 
of environmental benefits, health benefits, ben- 
efits against world starvation,’ he says. Largely 
because of the inefficiency of growing crops to 
feed livestock, a vegetarian diet requires only 
35% as much water and 40% as much energy 


FROM PIG TO PLATE 


Researchers are adapting tissue engineering 
techniques to grow edible meat, in vitro. 


(1) Take a small 


biopsy 


cells 


Extract 
myosatellite 


BY NICOLA JONES 


as that of a meat-eater’. Future ‘in-vitrotarians’ 
should be able to claim similar savings. 

The prospect of an alternative to slaughter- 
ing animals led People for the Ethical Treat- 
ment of Animals based in Norfolk, Virginia, to 
announce two years ago a US$1-million prize 
for the first company to bring synthetic chicken 
meat to stores in at least six US states by 2016. 
In the Netherlands, where the vast majority 
of work has been done so far, a consortium of 
researchers convinced the government to grant 
them €2 million (US$2.6 million) between 
2005 and 2009 for developing in vitro meat. 

Such incentives have helped to solve some 
of the basic challenges, applying human tissue- 
engineering techniques to isolate adult stem 
cells from muscle, amplify them in culture and 
fuse them into centimetre-long strips. But far 
more money and momentum will be needed to 
make in vitro meat efficient to produce, cheap 
and supermarket-friendly. Post estimates that 
creating his single sausage will require another 
year of research and at least $250,000. So what 
still needs to be done? 


CHOOSE THE RIGHT STOCK 

The first question for researchers is which cells 
to start with. Embryonic stem cells would pro- 
vide an immortal (and therefore cheap) stock 
from which to grow endless supplies of meat. 
But attempts to produce embryonic stem cells 
from farm animals have not been successful. 


(3) Add animal-free 
growth serum to 
multiply cells 


(2) 
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Most work so far has been on myosatellite 
cells, the adult stem cells that are responsible 
for muscle growth and repair. These can be 
obtained bya relatively harmless muscle biopsy 
from a pig, cow, sheep, chicken or turkey; the 
desired cells are then extracted using enzymes 
or pipetting, and multiplied in culture. 
Morris Benjaminson, professor emeritus 
at Touro College in New York, prefers a dif- 
ferent approach — planting the whole biopsy 
ina dish. “We use the whole business without 
the brain,’ he says. “We don’t break it down 
to cells and put them back together again.” He 
used this method to grow goldfish fillets in his 
lab in 2002, boosting the surface area by up to 
79% over a week by adding a shot of extra cells 
from ground-up muscle’. It’s not clear, how- 
ever, whether this procedure could produce 
enough muscle for a commercial enterprise. 
The fundamental problem is that myosatellite 
cells will only divide dozens of times, probably 
because their telomeres — the protective ends of 
the chromosomes — wear down with age. There 
are ways of boosting their proliferation. One is 
to add a gene for the repair enzyme telomerase. 
Another, being investigated by the start-up com- 
pany Mokshagundam Biotechnologies in Palo 
Alto, California, involves inserting a tumour- 
growth-promoting gene. But genetically modi- 
fied lab-grown meat might be too much for 
consumers to swallow. “Try selling that,’ laughs 
Post. An alternative is to get cells from a young 
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animal and perfect the rest of the system, such as 
the culture medium, to maximize growth. 

For now, Post uses regular cell-culture 
medium to grow his pork myosatellite cells. 
This contains fetal calf serum which, as it cur- 
rently comes from dead cows, largely defeats 
the point of synthetic meat. It also contains 
antibiotics and anti-fungal agents that might 
not be good for human consumption. “Suppos- 
edly you could be allergic to these; you never 
know,’ Post says. To get the cells to differentiate 
into muscle, he shifts to horse serum, which 
has the same list of problems. 

Animal-free media made from a slurry of 
plants or microbes are commercially available 
for biomedical work such as in vitro fertili- 
zation. But like animal-based media, they're 
expensive — right now, growth media account 
for about 90% of the material costs of lab- 
grown meat. And their composition is propri- 
etary, making them difficult to customize. One 
alternative might be to use ground-up maitake 
mushrooms, which Benjaminson found works 
just as well as calf serum for his fish fillets. At 
the University of Amsterdam, researchers 
working on in vitro meat have been developing 
a cheap medium made from blue-green algae, 
with added growth factors made in genetically 
modified Escherichia coli. But no one has yet 
developed a way of making a cheap, animal- 
free growth serum in large quantities. 


BEEF IT UP 
Myosatellite cells grown on a scaffold will fuse 
into myofibres, which then bundle together to 
make up muscle. But lab-assembled muscles 
are weak and textureless. “It’s like when you 
take off a cast after six weeks,” says Post. To 
get the muscle to bulk up with protein requires 
exercise. Assembling the myofibres between 
anchor points helps, as this creates a natural 
tension for the muscle to flex against. Post uses 
this type of arrangement to boost the protein 
content of a muscle strip from 100 milligrams 
to about 800 milligrams over a few weeks. He 
also administers 10-volt shocks every second, 
which can bump protein content up to about a 
gram. This much electricity would be expen- 
sive in a scaled-up industrial process, so his 
group is hoping to learn how to mimic chemi- 
cal signals that tell muscles to contract. 
Vladimir Mironov of the Medical University 


Exercise muscle 
to boost protein 


of South Carolina in Charleston is instead 
using a scaffold made of chitosan microbeads 
— chitosan can be sourced from crabs or fungi 
— that expand and contract with temperature 
swings, thus making a natural fitness centre for 
his muscle strips. 

If lab-grown muscle gets more than about 
200 micrometres thick, cells in the interior start 
to die as they become starved of nutrients and 
oxygen. Post simply grows many small strips 
that could be ground up into a sausage. Oth- 
ers, including Mironov, are using blender-sized 
bioreactors of the type developed by NASA to 
study muscle growth in low gravity. These con- 
ditions help prevent cell clumping and improve 
transport of oxygen and nutrients. 

Growing meat on an industrial scale would 
require large, customized bioreactors like 
those used by biopharmaceutical companies. 

Mironov estimates that 


“This could a commercial in vitro 

meat facility would 
have much need a five-storey build- 
greaterimpact _ing of bioreactors; with 

a similarly huge invest- 
ue au one ment. ree all that is 
medical work just for manufacturing 
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over 20 years.” pect of growing steaks 


is a much bigger chal- 
lenge, requiring a sys- 
tem of fake ‘blood vessels’ built into the meat. 
That is decades away. 


MARKET IT 
The thing that enthusiasts for fake meat talk 
least about is its taste, perhaps because they 
havent tried it. In the United States, research- 
ers have largely avoided eating anything grown 
in the lab for fear of violating a Food and Drug 
Administration regulation (it’s unclear whether 
it is actually forbidden) or being seen as pub- 
licity hounds. When Benjaminson grew his 
goldfish fillets, his team dipped them in olive 
oil, fried them in breadcrumbs and gave them to 
an odour and sight’ panel who said they seemed 
edible, but who weren't allowed to try them. 
Researchers generally believe that if they can 
get the texture right, taste will follow — par- 
ticularly once flavouring is added. Fortunately, 
myosatellite cells can also turn into fat, which 
would add to the taste. At Mokshagundam 


@ Add flavour, 
@) Grind up thousands iron and 
of muscle strips vitamins 


FEATURE | NEWS 


Biotechnologies, the goal is to make a spam- 
like mix of different muscle and other cell types 
that provide the ‘umami’ taste that character- 
izes meat. Scientists will also have to find a way 
of adding nutrients such as iron (which comes 
from blood) and vitamin B12 (which comes 
from gut bacteria). 

The process won't be cheap. By one rough 
estimate, first-generation lab meat could cost 
€3,500 per tonne (compared with €1,800 per 
tonne for unsubsidized farmed chicken meat)’. 
Mironov thinks the best way to secure an early 
market is by turning in vitro meat into a ‘func- 
tional’ food attractive to the rich and famous, 
perhaps by filling it with compounds that pro- 
mote health or suppress appetite. “Only Holly- 
wood celebrities like Paris will be eating this,’ he 
says. Alternatively, one could get an edge on the 
market by making meat products from exotic 
or even extinct animals, assuming a few of their 
cells could be saved. In the long run, advocates 
see a market in vegetarians and others who want 
guilt-free and environmentally friendly meat. 

Researchers such as Post believe that the sci- 
entific and technical advances needed to make 
and sell in vitro meat are worth the fight — but 
convincing funders remains the biggest obstacle. 
Post's funding from the Dutch government ran 
out in 2009; he came out of that with hundreds 
of pork strips, not the thousands he needs for 
a sausage. Today, a couple of umbrella organi- 
zations promote the cause, including the non- 
profit New Harvest, which provides small funds 
for US- and Europe-based work, and a new 
commercial company, California-based Pure 
Bioengineering, which aims to raise venture 
capital. But no windfalls have arrived as yet. 

Post will keep seeking funding for his dem- 
onstration sausage — but he knows that raising 
enough to commercialize the entire process 
will be a huge ask. “T usually say €100 million,” 
says Post. “That’s the number I forward to the 
government, and then they faint” m 


Nicola Jones is a freelance journalist based in 
Vancouver, Canada. 
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Increased patenting and licensing could lead to pharmaceutical breakthroughs in developing countries such as India, but slow progress in other areas. 


Lessons from Bayh-—Dole 


Developing countries wanting to boost commercialization of their academic research 
should learn from the mistakes of US patenting legislation, says Bhaven N. Sampat. 


hirty years ago this month, the US 
| Congress passed the Bayh-Dole 
Act. The policy replaced a mishmash 
of rules that had governed the ownership 
of patents resulting from publicly funded 
research. Under the act, grantees and con- 
tractors, instead of government funding 
agencies, hold title to inventions. 
Bayh-Dole has been widely celebrated for 
its effect on US universities. Since its passage, 
the number of patents that universities have 
been granted has climbed from fewer than 
300 a year to more than 3,000. And, hav- 
ing earned very little from licensing before 
the act, US universities now earn almost 
US$2 billion annually’. 
Policy-makers in other countries have 
taken these trends as evidence that promoting 


patents and exclusive licensing on the outputs 
of taxpayer-funded research enhances tech- 
nology transfer, commercialization and 
innovation. This has led numerous devel- 
oping countries — including South Africa, 
the Philippines and Brazil — to enact Bayh- 
Dole-style legislation. Others, including 
India, are considering similar approaches. 
Yet countries looking to boost commer- 
cialization should be wary of the myth that 
the act transformed US universities into 
entrepreneurial institutions capable of gen- 
erating successful spin- 
off firms, high-tech 


Scientists optfor jobs and self-sustaining 
consultancy over research funds — and 
invention: all at no cost to the 


taxpayer. Instead, they 


should note the problems that have arisen 
with the act, such as the overly restrictive 
patenting and licensing mentality it has 
generated among many technology-trans- 
fer offices, and craft their own legislation to 
avoid these pitfalls. 

The Bayh—Dole legislation was passed in 
response to a particular set of US problems at 
a particular time. An important motivation 
was to give universities the right to patent 
drug compounds, and to exclusively license 
them to companies. Before the act, to do 
either was difficult because of bureaucracy, 
particularly at the Department of Health, 
Education and Welfare. Policy-makers were 
also concerned that aggressive patent poli- 
cies established in the 1960s by the National 
Institutes of Health’s medicinal-chemistry 
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> programme had reduced collaboration 
between universities and industry. 

Another major concern in the 1970s was 
the allegedly low rate of commercialization 
of federally funded research, including that 
conducted outside universities. Less than 5% 
of the 28,000 patents owned in 1976 by the 
government were licensed to industry”. 

The economic argument for allowing com- 
panies exclusive access to drug compounds 
is a strong one. Universities generated nearly 
one-fifth of the drugs with the greatest clini- 
cal impact approved by the US Food and 
Drug Administration during the past three 
decades’. It is hard to imagine that the profit- 
oriented companies who developed these 
drug candidates and put them through clini- 
cal trials would have invested the hundreds 
of millions of dollars needed if competitors 
could copy and market the drug themselves. 


ANTIQUATED ARGUMENTS 

Thirty years on, the 1976 licensing figure 
and the rise of university licensing since 
1980 (see “Technology transfer’) form the 
central arguments used to claim that the 
Bayh-Dole Act was needed to boost tech- 
nology transfer for all government-funded 
research, not just for pharmaceuticals. But 
these figures are misleading because they 
downplay the other ways in which univer- 
sities contribute to economic growth and 
innovation. Researchers also disseminate 
their findings and ideas through consult- 
ing, publishing and teaching*. Indeed, the 
development of numerous US industries — 
including chemical engineering, aeronautics, 
computing and agriculture — relied heavily 
on academic research, but with little or no 
university patenting’. 

Although universities would probably not 
have made as much money, many of the non- 
drug technologies licensed after Bayh—Dole, 
including some of the most lucrative biotech- 
nology techniques, would have been picked 
up anyway from academic publications and 
other traditional channels of dissemination. 
The Cohen-Boyer patent, for example, which 
covers recombinant DNA cloning techniques 
and is held jointly by Stanford University in 
California and the University of California, 
has generated more than $250 million, but 
even Niels Reimers, who managed Stanford 
University’s licensing programme at the time, 
noted in a 1997 interview that “whether we 
licensed it or not, commercialization of 
recombinant DNA was going forward”®. 

In short, Bayh—Dole replaced one set of 
frictions with another — it eliminatated 
restrictions on patenting and technology- 
transfer licensing in favour of promoting 
excessive patenting and overly restrictive 
licensing. The growing aggressiveness of 
some technology-transfer offices in assert- 
ing their patents is now souring relationships 
between universities and industry, especially 


in information technology. Wayne Johnson, 
vice-president for university relations at 
computer giant Hewlett Packard in Palo 
Alto, California, testified before Congress 
in 2007 that Bayh—-Dole has “fuelled mis- 
trust, escalated frustration, and created a 
misplaced goal of revenue generation, which 
has moved universities and industry farther 
apart than they’ve ever been”. 

Developing countries should not follow 
the United States in enacting policies that 
undermine traditional ways of commercial- 
izing research output. Patents and exclusive 
licences can boost technology transfer when 
significant follow-on investment is needed to 
promote commercialization — for instance, 
in the development of pharmaceutical com- 
pounds or prototypes for medical devices. 
But outputs that can be used off-the-shelf, 
such as computer software and biotech- 
nology techniques, can be more effectively 
transferred by academic publishing, 
collaborations and teaching. 


TECHNOLOGY TRANSFER? 


University patents and licences have multiplied 
in recent decades, but this says little about the 
amount of technology transfer that is happening. 


Patents granted to US - 
universities by US Patent 
and Trademark Office 
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In India, a version of Bayh—Dole-type 
legislation, drawn up in 2008, came close 
to mandating the patenting of all academic 
research output; institutions that did not 
comply would risk having their funding 
withdrawn. An outcry from academics and 
others has led policy-makers to soften their 
approach*®. But the policy now under consider- 
ation still encourages patenting and licensing 
across the board — for example, for many 
of the software inventions emerging from 
Indian laboratories. In the Philippines, the 
recently passed Bayh—Dole analogue similarly 
fails to distinguish between inventions that 
should and shouldn't be patented, although 
regulations to control how the legislation 
is implemented are still being developed. 

Indeed, policies promoting broad and 
aggressive patenting may be more of a prob- 
lem in developing countries now than they 
were 30 years ago in the United States. More 
things are legally considered patentable, and 
under-resourced patent offices may strug- 
gle to weed out applications that aren't truly 
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innovative. Legislators in developing coun- 
tries need to distinguish between, and provide 
guidance on, the types of research that should 
be patented and exclusively licensed, and 
those that should be widely disseminated. 

Countries considering Bayh—Dole type 
legislation should also be prepared to subsi- 
dize their technology-transfer offices. Few 
US universities are making large returns” 
and many make negligible income or even 
a net loss. One approach to this problem is 
for funders to allocate grant money for the 
management of intellectual-property rights 
for the types of research likely to need it. 

Acomplicated issue for developing countries 
is whether they should treat academic patents 
and licences as a way to ensure that domes- 
tic firms and consumers, rather than large 
multinational companies, enjoy most of the 
benefits of taxpayer-funded research. This 
is particularly salient in countries without 
strong indigenous commercial capability, and 
where companies from developed nations 
might be better able to exploit innovations. 
Here again, drugs are a special case. For drug 
candidates with substantial markets in devel- 
oped countries — those for ‘global’ diseases 
such as HIV or cancer — university licensing 
policies could be designed to simultaneously 
promote local access and preserve strong 
incentives for drug development’. 

There is no one-size-fits-all solution. Given 
the growing importance of developing- 
country universities in the global scientific 
enterprise, and the importance of public sec- 
tor research for development, it is crucial that 
nations base their patent-reform laws on a 
clear-eyed assessment of their own problems 
and priorities. The choices made today will 
have profound effects on research, innovation 
and society for decades to come. m 


Bhaven N. Sampat is in the Department of 
Health Policy and Management at Columbia 
University’s Mailman School of Public 
Health, New York, New York 10032, USA. 
e-mail: bns3@columbia.edu 
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More for the research dollar 


Funders and universities should make the products of research more available — even if 
today’s researchers pay a price, say Jeffrey L. Furman, Fiona Murray and Scott Stern. 


n 29 October, the US government 
() filed a brief stating that isolated but 

unmodified DNA should not be pat- 
ented because merely isolating something 
does not turn it into a man-made product. 
This statement — submitted in a high-profile 
lawsuit over the validity of patents covering 
two genes linked to cancer — may or may 
not prevent the US Patent and Trademark 
Office adding new gene patents to the thou- 
sands already issued’. But the move has 
deepened the chasm between advocates 
of patenting research findings, and those 
calling for free and open access to publicly 
funded research. 

The dispute over whether genes should be 
patented (or whether it is even legal to patent 
them) is typical of a wider debate. Research- 
ers, open-source software designers, 
technology-transfer offices and entrepre- 
neurs tend to fall into one of two camps with 
opposing opinions over whether patents, and 
intellectual-property rights over scientific 
findings, are ‘good’ or ‘bad: 

Recent research in economics paints a far 
more complex picture. It suggests that scientific 
progress is not held up by intellectual- 
property rights per se but by the short-sighted 
ways in which these rights are often managed. 
Itis time for scientists, universities and, in par- 

ticular, funding agencies 


> NATURE.COM to start acting on such 
US governmentto —_ findings. 

limit gene patents: The concept of 
go.nature.com/nfyyos ‘governance — the rules, 


expectations and practices through which 
people, organizations or resources are 
controlled — is as relevant to research institu- 
tions as it is to corporations. Thirty years ago 
this month, the Bayh—Dole Act altered the 
governance of science in the United States 
by replacing a confusing mass of rules over 
the ownership of patents with an overarch- 
ing policy. The act gave universities — not 
funding agencies — the right to file and own 
intellectual property for inventions resulting 
from publicly funded research. 


SLOWING PROGRESS 

Economics is now beginning to shed light on 
the real-world effect of different governance 
schemes on scientific progress. Take a recent 
study” by Heidi Williams from the National 
Bureau of Economic Research in Cambridge, 
Massachusetts. This shows that, in the race 
to sequence the human genome, the dif- 
ferent approaches to intellectual-property 
management adopted by Celera Genomics, 
then in Rockville, Maryland, and the Human 
Genome Project had a dramatic effect on the 
rate of follow-on research. 

In the late 1990s, Celera Genomics, headed 
by Craig Venter, used copyright law to limit 
access to the firm’s gene-sequence data. By 
contrast, the US-government-funded Human 
Genome Project made its data available with 
minimal restrictions. Using indicators such 
as patents, numbers of papers published 
and commercially available diagnostic tests, 
Williams compared the rate of research 


associated with genes sequenced by Celera 
to that associated with genes sequenced by 
the Human Genome Project. She found that a 
diagnostic test was 30% less likely to be devel- 
oped for Celera-sequenced genes. 

Another set of studies involving patents 
owned by Harvard University in Cambridge, 
Massachusetts, and the US chemicals com- 
pany DuPont** demonstrates how shifts in 
governance can enhance scientific and tech- 
nological progress. In the 1990s, DuPont 
required academics and researchers work- 
ing for other companies to sign complex 
licensing agreements to use or develop 
two technologies used in mouse genetic 
engineering — the company’s Cre-lox 
recombinant technology and the Onco- 
Mouse (a mouse strain modified to carry a 
cancer-causing gene developed at Harvard 
and exclusively licensed to DuPont). 

Harold Varmus, then director of the US 
National Institutes of Health, established an 
agreement with DuPont in the late 1990s 
that changed how the company’s patents 
were managed’. Clear, simple licensing 
guidelines, and low-cost access to the mice 
enhanced follow-on research and prompted 
a burst of activity in novel areas. Mice strains 
derived from these technologies were cited 
at a 30% higher rate over expected levels for 
several years after the policy change’. 

Other work suggests that restricting 
researchers’ physical access to resources 
can be as damaging as doing so through 
contracts. For years, cell biology has been 
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hampered by scientists storing cell lines 
independently. A recent study shows that the 
numbers of papers linked to 108 cell lines 
jumped more than 50% within 3 years of such 
lines being transferred to biological resource 
centres — such as the American Type Culture 
Collection in Manassas, Virginia® (see ‘The 
positive effect of access’). Thus, even when 
biomaterials are unencumbered by intellec- 
tual-property rights, making them accessible 
through a trusted, open-access resource 
centre increases their effect on research. 

The challenge is to provide incentives for 
today’s researchers to create and character- 
ize novel materials, models and databases, 
while ensuring that tomorrow's researchers 
can access and use these resources to enhance 
their own productivity. We recommend, first, 
that scientists and policy-makers establish 
rules of practice that maximize the produc- 
tivity of research in the long term — even 
if those rules cost today’s researchers some 
inconvenience or loss of competitive edge. 
The data-sharing strategy used by sequenc- 
ers of the human genome offers a striking 
example of the effectiveness of this type of 
long-range planning. 

In 1996, those involved in sequencing the 
human genome, including the US National 
Institutes of Health and the UK Medical 
Research Council, introduced the Bermuda 
Rules. These essentially require publicly 
funded researchers to deposit their sequenc- 
ing data on a daily basis. Where researchers 
once had a monopoly over their data for sev- 
eral months, they now have sole access for less 
than 24 hours. In the short term, sequencers 
are less able to extract private value from their 
work. The benefits to subsequent research 
generations, however, in being able to quickly 
and easily access new sequence data soon after 
it is generated, have been enormous. 


GREATER DISCLOSURE 

Our second recommendation is that as a 
default, licensing transactions resulting 
from publicly funded research be disclosed. 
The results of research are generally made 
accessible through publishing, but materials 
— such as cell lines or tissue samples — and 
licensing contracts can be extraordinarily 
hard to obtain. For instance, at least nine 
patents owned by eight different entities’ 
cover the PSEN2 gene for a membrane pro- 
tein. Although the information regarding 
ownership of intellectual-property rights is 
published by the US Patent and Trademark 
Office, neither universities nor companies 
publicize which companies have licensing 
contracts with the patent owners. 

To change this, funding agencies should 
insist that licensors report each transac- 
tion including the identity of licensees and, 
when feasible, the structure of the transac- 
tion’. A standardized, accessible database 
of such transactions (managed perhaps 


THE POSITIVE EFFECT OF ACCESS 


The number of papers citing research on 108 cell lines rose rapidly after the cell lines were moved to a 
centralized, open-access culture collection. Data normalized by cell line, age of research and year of citation. 
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by the US National Science Foundation) 
would reduce future transaction costs for 
innovators trying to build on ideas with 
many different patented elements. 

Third, licences and other access rules 
should be structured so as to enable further 
research by as diverse a group of scientists, 
innovators and entrepreneurs as possible. 
This does not happen for many resources. For 
example, roughly 60% of university licences 
are awarded exclusively to single companies’. 
This means that scientists at other institutions 
or companies invariably have to pay for, or are 
prohibited from using, particular ideas’. Italso 
means that it is up to licensees whether others 
can use the university's intellectual property to 
develop novel applications, or make resulting 
products available to the widest set of users, 
including in developing countries. 

The Bayh-Dole Act grants universities 
flexibility in shaping how intellectual-prop- 
erty rights are used, but most funders are 
passive in ensuring that such rights don't 
inhibit cumulative research. In the Onco- 
Mouse case, for instance, policy-makers 
reacted only after a decade of dispute. Some 
technology-transfer offices have tried to come 
up with standard language for transparent 
licensing agreements to ensure, for example, 
global access to ideas and to the products gen- 
erated from them. Although not yet widely 
adopted by universities, such an approach 
provides a valuable starting point. 

Encouraging the broadest possible use of 
resources must apply to physical access as 
well. Some well-intentioned foundations, such 
as the International Myeloma Foundation 
(IMF) in North Hollywood, California, have 
taken the lead in establishing crucial disease- 
specific resources, including patient tissue 
samples. But, like the IME, some foundations 
have granted only a select set of researchers 
access to the samples in the hope of attracting 
them to unique research opportunities. A bet- 
ter model is provided by the Coalition Against 
Major Diseases established by the Critical 
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Path Institute, in Tuscon, Arizona. In June 
this year, the members — including patient 
advocates, pharmaceutical companies, and 
various institutes and agencies — agreed to 
pool and share data from failed Alzheimer’s 
disease clinical trials, thereby broadening 
access to otherwise proprietary data. 

Ata time when the public funding of sci- 
ence is under intense scrutiny, tremendous 
opportunity exists to establish policies that 
would greatly increase the impact of every 
dollar of research funding spent. m 
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CORRECTION 

The Comment article ‘Tar sands need 
solid science’ (D. Schindler Nature 468, 
499-501; 2010) stated that the 650 km? 
footprint of the tar-sands mining is one- 
hundredth the size of Alberta or Texas. It is 
one-thousandth the size of those areas. 
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The 1940s Atanasoff-Berry Computer (ABC) was the first to use innovations such as vacuum tubes. 
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The ABC of 
computing 


An engaging biography of John Atanasoff reveals the 
obscure origins of the computer, explains John Gilbey. 


ho invented the digital computer? 
Depending on your definition, 
mathematical pioneers such as 


John von Neumann or Alan Turing might 
spring to mind, but its origin lies with US 
physicist John Atanasoff. Although few peo- 
ple could name him today, this rewarding 
biography by Pulitzer prizewinning author 
Jane Smiley may change that. 

Atanasoff embodies the American Dream. 
The son of a Bulgarian immigrant who had 
fled to the United States as a child in the 
late 1880s, he grew up on the family farm 
in Florida. Through mastering the slide 
rule, helping his father with house electrical 
wiring and driving the family’s Model T Ford 
at age 11, he developed a passion for engi- 
neering and mathematics. 

After graduating from the University of 
Florida in Gainesville in 1925, with the high- 
est grade average it had ever recorded, Atana- 
soff joined a master’s programme at what is 
now Iowa State University in Ames. He turned 


down an offer to move to Harvard University 
and gained a PhD in physics at the Univer- 
sity of Wisconsin-Madison. He returned to 
Iowa State — again declining an offer from 
Harvard — as an assistant professor. 

In The Man Who Invented the Computer, 
Smiley describes how Atanasoff developed 
an interest in mechanical calculators and 
modified an IBM tabulator to suit his own 
needs. But to meet his wider scientific aspira- 
tions — in particular, to solve simultaneous 
linear equations quickly — he realized that 
he would have to build a calculator himself. 
His struggle to design it concluded with an 
episode of pure cinema. Atanasoff, “unhappy 
to an extreme degree’, jumped in his car and 
drove more than 300 kilometres to the shore 
of the Mississippi River. Sitting in a roadside 
tavern with a glass of bourbon and soda, the 
solution fell into place. He began to make 
notes on a paper napkin. 

Crucially, Iowa State had an excellent 
college of engineering. In 1939, Atanasoff 
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teamed up with recent graduate Clifford 
Berry to develop the system that became 
known as the Atanasoff-Berry Computer 
(ABC). Built on a shoestring budget, the 
simple ‘breadboard’ prototype that emerged 
contained significant innovations. These 
included the use of vacuum tubes as the com- 
puting mechanism and operating memory; 
binary and logical calculation; serial com- 
putation; and the use of capacitors as storage 
memory. By the summer of 1940, Smiley tells 
us, asecond, more-developed prototype was 
running and Atanasoff and Berry had writ- 
ten a 35-page manuscript describing it. 

Other people were working on similar 
devices. In the United Kingdom and at 
Princeton University in New Jersey, Turing 
was investigating practical outlets for the 
concepts in his 1936 paper ‘On Comput- 
able Numbers: In London, British engineer 
Tommy Flowers was using vacuum tubes as 
electronic switches for telephone exchanges 
in the General Post Office. In Germany, 
Konrad Zuse was working on a floating-point 
calculator — albeit based on electromechani- 
cal technology — that would have a 64-word 
storage capacity by 1941. Smiley weaves these 
stories into the narrative effectively, giving a 
broad sense of the rich ecology of thought 
that burgeoned during this crucial period of 
technological and logical development. 

The Second World War changed every- 
thing. Atanasoff left Iowa State to work in 
the Naval Ordnance Laboratory in Washing- 
ton DC. His prototype computer remained 
unpatented in the basement of the physics 
department until the machine was broken 
up in 1948. The exigencies of war meant that 
substantial resources were made available 
for key computing projects such as the vast 
Electrical Numerical Integrator and Calcu- 
lator (ENIAC) machine at the University of 
Pennsylvania in Philadelphia, the launch 
of which Atanasoff attended in 1946. But 
Atanasoff moved on, and in 1951 went into 


John Atanasoff built the first electronic computer. 
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business forhimself. His 
it Ordnance Engineering 
~ MAK WH Corporation was sold 


INVENTED 


for a healthy profit five 
years later. 
Atanasoff was 
brought back into the 
picture by the untimely 
death of Berry in an 
apparent suicide in 
i“ Man Who es Concerned, 
nvented the 
Computer: The Atanasoff travelled to 
Biography of John New York to investi- 
Atanasoff, Digital gate. The family con- 


Pioneer sidered that murder 
JANE SMILEY was a possibility — 
eae es : 0. Berry’s father had been 

shot decades earlier by 


a disgruntled ex-employee — but it was never 
proven. 

In 1973, Atanasoff again found himself 
in the spotlight after his work was cited in 
the conclusions of a patent dispute between 
computing-industry giants Honeywell 
and Sperry Rand about the early develop- 
ment of the digital computer. Smiley quotes 
Judge Earl Larson’s acknowledgement that 
“between 1937 and 1942, Atanasoff... devel- 
oped and built an automatic electronic dig- 
ital computer for solving large systems of 
simultaneous linear algebraic equations”. 

Judge Larson further noted that John 
Mauchly, one of the ENIAC developers 
who had visited Atanasoff in Iowa, had 
inspected the Atanasoff-Berry Computer 
and had read the manuscript describing it. 
Mauchly derived from this, the judge said, 
“the invention of the automatic electronic 
digital computer’ claimed in the ENIAC pat- 
ent” — indicating Atanasoff’s key contribu- 
tion, albeit unwitting, to the later project. 

Belatedly, and largely through the advo- 
cacy of friends and writers, Atanasoff gained 
recognition. Owing to his father’s origins, he 
received early plaudits in Bulgaria, where 
in 1970 he was granted the Order of Cyril 
and Methodius, First Class. In 1990 he was 
awarded the National Medal of Technol- 
ogy by President George H. W. Bush for 
his invention of the electronic digital com- 
puter and for contributions to the develop- 
ment ofa technically trained US workforce. 
Atanasoff died in 1995. 

The Man Who Invented the Computer is 
a vivid telling of the early story of the com- 
puting industry. By focusing on Atanasoff, 
Smiley blends obscure threads with those 
that are better known. The result would, 
without embellishment, make an exceptional 
feature film. = 


John Gilbey teaches in the Department 
of Computer Science at Aberystwyth 
University, Aberystwyth, Ceredigion 
SY23 2AX, UK. 

e-mail: gilbey@bcs.org.uk 
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Books in brief 


Wicked Company: Freethinkers and Friendship in 
Pre-Revolutionary Paris 

Philipp Blom Basic Books 384 pp. $29.95 (2010) 

The French Enlightenment’s triumph of reason over religious 
dogma was plotted in an eighteenth-century Paris salon. Hosted by 
Baron Paul-Henri Thiry Holbach, the radical thinkers who gathered 
there included the philosophers Denis Diderot and Jean-Jacques 
Rousseau. Historian Philipp Blom revives their legacy and examines 
the rivalries that sprang up among the group and with competitors 
such as the writer Voltaire. Their ideas about society and the natural 
world went on to influence politics and science globally. 


How Old is the Universe? 

David A. Weintraub PRINCETON UNIVERSITY PRESS 380 pp. 

$29.95 (2010) 

Astronomer David Weintraub explains in his latest book how we 
know that the Universe is 13.7 billion years old, a finding that has 
had an impact on science, philosophy and religion. By looking at the 
various ways in which the age of the cosmos has been established 
over the centuries — from the lifecycles and pulsations of stars 

to galactic structures and cosmology — he reveals the process of 
scientific enquiry and shows how astronomers gather evidence to 
grapple with deep questions. 


The Abacus and the Cross: The Story of the Pope Who Brought the 
Light of Science to the Dark Ages 

Nancy Marie Brown BASIC Books 328 pp. $27.95 (2010) 

Far from being intolerant of science, the medieval Catholic Church 
saw reason as a means of getting closer to God. In the year 1000, 
there was even a ‘scientist pope’: Gerbert of Aurillac was the leading 
mathematician and astronomer of his day. Science writer Nancy 
Marie Brown describes his dramatic rise from humble peasant to 
visionary pontiff. A mathematics teacher to kings, and occasional 
spy, he adopted scientific ideas from the Islamic world, including the 
nine Arabic numerals and the concept of zero. 


The God Instinct: The Psychology of Souls, Destiny and the 


TESSE BERING Meaning of Life 
Jesse Bering NICHOLAS BREALEY PUBLISHING 288 pp. £16.99 (2010) 
P Psychologist Jesse Bering argues that religious beliefs are a 
GOD sophisticated cognitive illusion rather than an irrational delusion. 
INSTINGy | Because we have the ability to think beyond our immediate 
ne surroundings, we have evolved a tendency to project the idea that a 
he transcendent being, or God, influences our lives. Taking a balanced 
2 ; and considered approach to this often inflammatory topic, he 
? explains why this religious trait has evolutionary benefits and why it 
sets us apart from other animals. 


Hope is an Imperative: The Essential David Orr 

David Orr ISLAND PRESS 400 pp. £31 (2010) 

Key writings of environmental scientist David Orr from the past 

30 years are collected in this volume. A champion of ecological 
design, Orr explains why it is important to educate people about 
sustainability, why university campuses should be green, and the 
environmental consequences of bringing children into the world. 
Leading a push within his own town of Oberlin, Ohio, to embrace 
green building practices, he reveals why he is both an optimist and 
a pragmatist. 
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Sky Mirror (2006) in London’s Kencington Gardens is one of four Anish Kapoor works manufactured by the process that is used to grind scientific optics. 


Engineering art 


Neil Dodgson admires the technical mastery of sculptor Anish Kapoor. 


is famous for his architectural sculptures 
and vivid use of colour. His works are also 
feats of engineering — his speciality at uni- 
versity before he left to pursue his art. From 
ArcelorMittal Orbit, a tower of twisted steel 
chosen as the centrepiece for Londons 2012 
Olympic park, to Svayambh, a gliding slab of 
blood-red wax, the significance of Kapoor's 
installations lies in how they are built. 
Kapoor, who is exhibiting in London, New 
Delhi and Mumbai, regards his sculptures as 
embodiments of “mythologies” that include 
the process of their creation. “Meaning is 
gradually constructed, just as the object is 
constructed,” he explains. The shows in India 
highlight his dynamic artworks — shown in 
the past year at London’s Royal Academy and 
at the Guggenheim Bilbao in Spain — which 
use machinery to evoke a sense of change. 
His current London exhibition, in Kensing- 
ton Gardens, features four highly polished 
stainless-steel forms that distort reflections of 
their surroundings like fairground mirrors. 
The genesis of that series lies in Kapoor’s 
collaboration with Cecil Balmond, head of the 
Advanced Geometry Unit at engineering firm 
Arup. Kapoor first worked with Balmond a 
decade ago to produce a sculpture for the 


[sere British artist Anish Kapoor 


cavernous turbine hall Anish Kapoor: 
at London's Tate Mod- Tu ring the World 
ern. Hehelped Kapoor Upside Down 


: . . Kensington Gardens, 
to refine his aesthetic iendon, wade Merch 


ideas, bringing exper 9917, 
tise in construction 
techniques, the tensile 
strengths of materials 


Anish Kapoor 
National Gallery of 
Modern Art, New Delhi, 


and thelimitsofmanu- Uti! 27 February 

facturing. The product 2011; and at Mehboob 
Majeize (2002) = Studios, Mumbai, until 

Waste) : 16 January 2011. 

two massive steel rings 


joined by a red PVC membrane stretched 
140 metres between them, supporting a third 
steel toroid above visitors’ heads. Balmond 
reprogrammed Arup’ in-house software to 
model the membrane’ precise form. 

A discarded design later appeared in 
Chicago as Cloud Gate (2004): a 10-metre- 
high, jelly-bean-like arch of polished steel. 
The forms in Kensington Gardens are similar 
in style. Cut from segments of a sphere, they 
were produced by the same process that is 
used for grinding large scientific optics. One, 
C-Curve (2007), reminds me ofa smaller mir- 
ror in my office: a relic of a prototype three- 
dimensional television. That too is beautifully 
made, but its bending of light is directed by 
a practical purpose. By contrast, Kapoor's 
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curved mirrors are engineered to reflect the 
viewer's inner world. 

In Kapoor’s recent foray into dynamic 
works, now on show in India, exquisite 
engineering underlies other mythologies. In 
Svayambh (2007), an enormous block of red 
wax creeps along a hydraulic track, appar- 
ently being shaped as it passes through several 
gallery doorways. The name derives from a 
Sanskrit word, referring to that which is cre- 
ated of its own accord, rather than by a human 
hand. In fact, little wax is scraped off the 
installation after its first pass. Kapoor delights 
in this fiction: “The wax is not literally carved 
by the doorways, although it appears to be.” 

Kapoor's meanings are complex and lay- 
ered. Svayambh, he explains, represents geol- 
ogy, body, blood and viscera, among other 
themes. It is difficult to engineer such a piece, 
with its combination of motors, mechanism 
and soft material requiring careful design 
and constant maintenance. Questioning the 
artist’s intentions and methods unveils the 
fiction that the artwork formed itself. 

A second wax piece seems more convinc- 
ingly self-made. Shooting into the Corner 
(2008-09) is a large air-fired gun that fires 
11-kilogram cylinders of red wax across 
the gallery every 20 minutes. The result is 


ANNTHEA LEWIS 


a chaotic pile. No artist directs its creation; 
random perturbations are caused by varia- 
tions in the consistency of the wax, the gun 
pressure and in how the deposits accumulate. 
Yet it is stage-managed. The art is not in the 
wax mound but in the whole performance. 
Another artful machine generated a set of 
extruded grey concrete sculptures called Grey- 
man Cries, Shaman Dies, Billowing Smoke, 
Beauty Evoked (2008-09). These were pro- 
duced bya scaled-up version of a rapid proto- 
typing machine. Such technology is normally 


used by engineers to build accurate models 
from fine threads of molten plastic. Kapoor's 
larger version extrudes a thick concrete sau- 
sage that builds up layers of soft coils, ropes 
and worms under computer control. 
Whereas engineers seek precision with 
their models, Kapoor delights in his prod- 
ucts imperfections. Yet the appearance of 
randomness involves technical sleights of 
hand. To achieve each particular texture, his 
contraption must be finely tuned. Kapoor 
deliberately finds a point of balance between 
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opposites — between perfection and imper- 
fection, softness and firmness, movement and 
repose — to tantalize the viewer. 

Kapoor knows he is treading a fine line 
between artist and entertainer. He says: “It’s 
a short trip from Disneyland to something 
truly mysterious.’ But that mystery is deliv- 
ered only through precise engineering. m 


Neil Dodgson is professor of graphics and 
imaging at the University of Cambridge, UK. 
e-mail: neil.dodgson@cl.cam.ac.uk 


Measure for measure 


A useful guide to citation analysis shows that counting 
publications is harder than it looks, finds Ton van Raan. 


itation analysis offers a means to 
quantify the impact of a scientist’s 
work. One tool for tracking cita- 
tions is the Publish or Perish (PoP) software 
program developed by Anne-Wil Harzing, 
professor of international marketing at the 
University of Melbourne, Australia. Her 
guide describes how her program generates 
citation analyses from Google Scholar and 
gives an overview of bibliometric methods 
and sources. She champions the practical use 
of citation measures, yet also recognizes that 
calculating them reliably is a difficult task. 
The Publish or Perish Book focuses on 
citation analysis of individual researchers, 
not groups or institutes. Several metrics may 
be calculated for scientists and for journals, 
including their number of publications and 
citations, average number of citations per 
publication and per author, and the h-index, 
a widely used characterization of citation 
impact. Harzing argues using practical exam- 
ples that such indicators are good markers 
ofa researchers’ influence, and are useful in 
assessing applications for jobs, promotion 
and tenure, and for literature research and 
choosing a journal in which to publish. 
Using Google Scholar as a data source is 
advantageous as it retrieves publications 
not covered by Thomson Reuters’ Web of 
Science: books, edited volumes and ‘grey’ 
literature such as conference proceedings. 
Harzing explains how to analyse citations 
with Google Scholar, and discusses ways that 
citation patterns of early reports can be used 
to predict the later impact of journal articles 
derived from them. But there are inevitable 
problems in tying together varied data, such 
as matching conference proceedings with the 
subsequently published paper. 
Harzing considers the main downside of 
the Web of Science to be its limited coverage 


of different disci- 
plines, particularly of 
engineering, the 
social sciences and the 
humanities. In my view, 
however, its coverage 
of well-funded fields, 
such as the natural 
sciences and medi- 
cine, is very good. For tye pyblish or 
novice users, the Web Perish Book: A 

of Science does have Guide to Effective 
trouble identifying and Responsible 
ambiguous author Citation Analysis 
names, especially those Suhaehae MARZING 

: F arma Software 

in which the order  pocoarch: 2010. 

of the first name and 250 pp, $29.95 
surname is unclear. It 

also struggles to aggregate articles with varia- 
tions of the same title and to identify self-cita- 
tions. But professional bibliometricians such 
as myself build and work from Web of Sci- 
ence reconstructions — usually proprietary 
to their institutes — in which such sources of 
error are fixed. 

The book underplays the ethical issues that 
arise when performing a citation analysis for 
a person other than yourself. Verification of 
research output is important — missing just 
one highly cited paper can distort the results 
dramatically. This highlights the necessity 
of cleaning raw data. For instance, incorrect 
referencing will lead to cited publications 
being missed. It is a huge effort to correct 
for these ‘homeless’ citations. In this respect, 
Google Scholar is a black box. 

Harzing discusses both the perspective 
of the person to be 
evaluated, and that 
of the evaluator. This 
is important because 
evaluators of tenure 
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promotions might apply home-made met- 
rics that are not transparent and may incor- 
porate unknown mistakes. Citation metrics 
are attractive because they have the potential 
of objectivity, but evaluators may put too 
much faith in quantitative aspects ofresearch 
performance. Simple metrics then become a 
shortcut to decision-making. 

Many scientists are concerned that cita- 
tion analysis, particularly that done in an 
amateurish way, is having detrimental effects 
on science. They fear that researchers are 
driven to pursue citation quantity instead 
of scientific quality. Statistical reliability 
may become a serious problem when deal- 
ing with individuals rather than groups, as 
Harzing recognizes. Field-specific normali- 
zation is also necessary if research impact is 
to be compared across disciplines. 

Further statistical factors limit metrics. 
Indicators often concern arithmetic mean 
values, yet the distribution of citations across 
publications is skewed. Averages are thus not 
the best statistic. Although this problem is 
discussed, Harzing’s book doesnt offer indi- 
cators that are related to the distribution of 
impact across a field, which would answer 
the question “Does he or she belong to the 
top 10% of his or her field?’ 

Harzing explains that the problem of the 
skewed distribution can be removed using 
the h-index: for instance, a researcher has an 
h-index of ten if ten of his or her papers have 
at least ten citations and the other papers 
have no more than ten citations each. But 
in my view, the h-index is inconsistent. For 
example, suppose that researcher A has three 
publications with five citations each (h=3) 
and researcher B has four with four cita- 
tions each (h=4). Both obtain one additional 
publication with five citations. Researcher 
A’s h-index then increases to four, whereas 
researcher B’s h-index remains equal to four. 
This makes no sense. 

With these caveats in mind, The Publish 
or Perish Book is a useful resource for scien- 
tists, particularly in fields in which Google 
Scholar is a major source of citations. = 


Ton van Raan is professor of science studies 
in the Centre for Science and Technology at 
Leiden University, the Netherlands. 
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A magical process 


Prince Charles’s call to stay close to nature follows a rich 
tradition of environmental thinking, says Philip Stott. 


his esteemed journal notwithstand- 
Tine one should generally be wary 

of those who present Nature with a 
capital N. Such commentators resent the 
indifference they perceive in the natural 
world to moral values, preferring to see in 
nature a reflection of their own persona. 
They reify and then deify the natural world, 
and worship it as virtuous. 

This is absolutely the tenor of Harmony, 
Prince Charles’s call to virtue through 
humans emulating “the natural order and the 
rhythm in things” Like others before him, 
the heir to the British crown — who has long 
expressed views on the environment, and is 
a champion of organic and traditional farm- 
ing methods — wishes to speak on nature's 
behalf. By taking as our guide the rhythms 
and patterns that lie within us, he writes, we 
may build a more durable and pleasant soci- 
ety and acquire deep philosophical insights 
that are embedded in our traditions. 

For him, Nature's virtue lies in its ability to 
replenish itself efficiently and without waste, 
through cycles that amount to a “magical 
process”. This is a fascinating concept, until 
one remembers that more than 95 per 
cent of all life has been discarded on 
an Earth where volcanoes blow, robins 
kill robins, forests come and go, and 
viruses prey on all. 

Harmony clearly represents a 
personal statement by the prince. 
Although he thanks his co-authors, 
environmentalist Tony Juniper and 
radio broadcaster Ian Skelly, many 
paragraphs open with a resounding 
“LT Charles’s mission is to articulate 
his belief that our broken connec- 
tion with nature will drive humanity 
to oblivion. The cover states that “our 
disconnection from Nature has con- 
tributed to the greatest crisis in the 
history of mankind”. 

Such a jeremiad is far from new. 
The prince is rehearsing environmen- 
tal themes that have a deep pedigree 
in European and American thought. 
The influential Vermont diplomat 
George Perkins Marsh, an early 
prophet of environmental concerns 
who wrote the 1864 masterpiece Man 
and Nature, would immediately recog- 
nize its tropes. So too would twenti- 
eth-century environmental writers 


such as Aldo Leopold, 
with his land ethic in 
A Sand County Alma- 
nac (1949), and H. J. 
Massingham with The 
Wisdom of the Fields 
(1945) — both pub- 
lished in the decade in 
which the prince was 
born. Charles would 
no doubt approve of 


Harmony: A New 
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is so pronounced that 
it is dragging Western 
civilisation nearer and 
nearer to some fall like Lucifer’s”. 

All of these tracts rest on long-standing 
European foundations. These include the 
idea of the ‘virtuous rural’ expressed in the 
Roman poet Virgil’s Eclogues and Geor- 
gics, and the concept of the noble savage 
expounded in the late sixteenth and seven- 
teenth centuries by French essayist Michel de 
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Prince Charles advocates a return to the natural order of things. 
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Montaigne and English poet John Dryden. 
They also have a basis in German roman- 
ticism, including the philosophies of ‘the 
organic and holism developed in the nine- 
teenth century by, for example, biologist 
Ernst Haeckel. As one might observe, there 
is little new under the solar panel. 

The benevolent depiction of Nature so 
admired by the prince is conjured in his own 
well-meaning image. The cover of the UK 
edition — a charming set of neopastoralist 
cameos reminiscent of the inside covers of 
a classic Rupert Bear Annual — says it all. 
But for many, Nature is no place to seek an 
explanation of ethics, virtue or a sense of 
the numinous. There is little harmony in 
the modern ecological concepts of disequi- 
librium and non-equilibrium, which hold 
imbalance and constant change to be the 
essential state, nor in a different selection of 
philosophies and poems. Tennyson's power- 
ful 1849 poem In Memoriam, for example, 
tells of Nature’s inherently destructive char- 
acter: “I care for nothing, all shall go”. 

The prince muses on issues of long-term 
concern to him — the threat of global warm- 
ing, the promise of alternative medicines, the 
dangers in modern farming, the brutalities of 
urbanism and, like Massingham, the wisdom 
of the fields and of the past. There are incon- 
sistencies. For example, the classic doom- 
laden picture of a tropical forest seemingly 
laid waste by ‘slash and burr’ agriculturalists 
inevitably appears. Yet such interventions 
can make use of the traditional approaches 
favoured by the prince — many practitioners 
of shifting cultivation use field cycles that pre- 
serve soils on steep slopes and increase 
yields. By contrast, some First Nation 
peoples were highly destructive of their 
environments, not protective. 

Harmony is a mishmash of selected 
concepts of a reified Nature, rather 
a lot of mysticism and, admittedly, a 
selection of sound and welcome prac- 
tical comments on aspects of farming 
and urban living. The prince is right to 
castigate our pollution of the oceans 
with plastic rubbish. It is surely a sick 
bird that fouls its own nest. Many 
will also welcome his support for the 
conservation of the red squirrel in the 
United Kingdom. 

This attractively produced book 
may delight and stir those for whom 
the world has never been modern. 
For others, it might merit Viscount 
Castlereagh’s acerbic dismissal of Tsar 
Alexander I’s Holy Alliance of 1815: 
a “piece of sublime mysticism and 
nonsense”. m 


Philip Stott is emeritus professor 
of biogeography in the University of 
London, UK. 

e-mail: sinfonial@mac.com 
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ORRESPONDENCE 


Ease public concern 
over oil pipeline 


The need for environmental 
scientists to address the effects 
of tar-sands mining in Canada 
(Nature 468, 476; 2010) should 
be extended to the impacts of 
downstream operations. 

Communities across the 
northern High Plains region 
of the United States are 
concerned about the risks of 
piping crude oil from Canadian 
tar sands across ecologically 
sensitive prairie and through an 
important recharge zone of the 
Ogallala Aquifer — the route 
of the proposed Keystone XL 
pipeline through the Nebraska 
Sand Hills. 

The public debate is being 
conducted largely in the 
absence of scientific evidence 
about risks to water resources 
and aquatic ecosystems. This 
causes misinformation to 
circulate: for example, local 
stakeholders commonly 
believe that any oil released 
from a ruptured pipeline could 
contaminate the entire High 
Plains groundwater supply 
— based on the widespread 
misconception of an aquifer as 
an underground lake. Others, 
by contrast, believe that 
spilled oil would be harmlessly 
sequestered in the aquifer. 

Much of the blame for 
these misconceptions must be 
down to poor communication 
with the public by scientists. 
However, scientists themselves 
are often hampered from 
providing technical input 
because of their limited access 
to data — as has happened 
with the Keystone XL proposal. 
Important data pertaining to 
this have not been divulged 
to the public, such as the fluid 
chemical composition and the 
maximum pipeline leakage 
volumes. 

Disclosure of relevant data 
must be comprehensive if 
the risks associated with the 


pipeline are to be properly 
assessed. 

John B. Gates University of 
Nebraska-Lincoln, USA. 
jgates2@unl.edu 


Sustainable cities: 
seeing past the trees 


The problems facing our cities 
call for a holistic approach, not 
just for ecological solutions 
(Nature 468, 173; 2010). We also 
need to consider the resilience 
of the changes we make to the 
urban landscape in the name 

of sustainability (see www. 
urban-futures.org) and strike a 
balance between the benefits and 
disadvantages of these strategies. 

Take street trees planted to 
improve biodiversity. They reduce 
air pollution by increasing particle 
deposition and replenishing 
oxygen, yet may also exacerbate 
it by reducing ventilation. They 
provide shade but limit passive 
solar heating. Their amenity value 
may be undermined by high 
costs for repairing infrastructure 
damaged by ground shrinking 
and swelling. Although they help 
to mitigate light pollution, trees 
are likely to increase lighting 
requirements, and although 
they store water they may need 
irrigating — and so on. 

A continuing positive outcome 
will depend on thoughtful 
assessment of the competing and 
shifting aspects of sustainability. 
Rob MacKenzie, Tom Pugh 
Lancaster University, UK. 
r.mackenzie@lancs.ac.uk 
Chris Rogers University of 
Birmingham, UK. 


Time to underpin 
Wikipedia wisdom 


Wikipedia, the world’s largest 
online encyclopaedia, is regarded 
with suspicion by some in the 
scientific community — perhaps 
because the wiki model is 


inconsistent with traditional 
academic scholarship (Nature 
468, 359-360; 2010). But the 
time has come for scientists 
to engage more actively with 
Wikipedia. 

Type any scientific term into 
any search engine and it is likely 
that a Wikipedia article will be 
the first hit. Ten years ago, it 
would have been inconceivable 
that a free collaborative website, 
written and maintained by 
volunteers, would dominate the 
global provision of knowledge. 
But Wikipedia is now the first 
port of call for people seeking 
information on subjects that 
include scientific topics. Like 
it or not, other scientists and 
the public are using it to getan 
overview of your specialist area. 

Wikipedia's user-friendly 
global reach offers an 
unprecedented opportunity for 
public engagement with science. 
Scientists who receive public 
or charitable funding should 
therefore seize the opportunity 
to make sure that Wikipedia 
articles are understandable, 
scientifically accurate, well 
sourced and up-to-date. 

Many in the scientific 
community will admit to using 
Wikipedia occasionally, yet few 
have contributed content. For 
society’s sake, scientists must 
overcome their reluctance to 
embrace this resource. 

Alex Bateman, Darren W. 
Logan Wellcome Trust Sanger 
Institute, Hinxton, UK. 
agb@sanger.ac.uk 


Guest authors: for 
contributors only 


In your Careers feature on 
tenured academic positions 
(Nature 468, 123-125; 2010), 
you recommend that those 
seeking tenure should “Name 
a senior department member 
as a co-author on your papers 
if you're in Europe”. But this 
regretfully common practice 


should be weeded out, not 
encouraged. 

Such a recommendation 
would be worthwhile if it were to 
motivate a genuine collaboration 
and result in a significant 
contribution from the senior 
scientist to the paper. 

Mark J. van Raaij Centro 
Nacional de Biotecnologia, Spain. 
mijvanraaij@cnb.csic.es 


Guest authors: no 
place in any journal 


In the pursuit of tenure, you 
encourage academics in Europe 
to include a senior department 
member as co-author on their 
papers (Nature 468, 123-125; 
2010). Regardless of any 
geographical limitation, it was 
very disappointing to see this 
endorsement of guest authorship 
published in Nature. 

Richard M. Glass Journal of the 
American Medical Association, 
Chicago, Illinois, USA. 
richard.glass@jama-archives.org 


Editor’s note Nature requests 
author contributions statements 
and does not condone guest 
authorship (see go.nature.com/ 
uzgbki). The intended meaning 
of the ambiguous sentence was 
that tenure-track scientists should 
seek out senior department 
members as collaborators and 
contributing co-authors. 


CONTRIBUTIONS 

Items for Correspondence 
may be sent to 
correspondence@nature. 
com after consulting 

the author guidelines at 
http://go.nature.com/ 
cmchno. They should be 
no longer than 350 words. 
Readers are also welcome 
to comment online on 
anything published in 
Nature: www.nature.com/ 
nature. 
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OBITUARY 


Michael Tinkham 


(1928-2010) 


Physicist who helped to unravel the mysteries of superconductivity. 


Imost 100 years after supercon- 
As was discovered in 1911, 

the field has lost one of its finest 
contributors and certainly its most impor- 
tant contemporary articulator of how the 
phenomenon works. 

Throughout his life, Michael Tinkham, 
who died on 4 November, never lost his 
remarkable ability to recognize the essentials 
and explain them to the rest of us. A colleague 
of his once said you could take him data that 
looked like pigeon droppings and leave with 
flakes of gold. Irreverent perhaps, but legions 
of graduate students and postdocs shared this 
experience. Tinkham's classic book Introduc- 
tion to Superconductivity, first published in 
1975, remains to this day the definitive treat- 
ment of the subject— making it accessible to 
a wide range of scientists and engineers. 

Tinkham started his academic career as an 
undergraduate at Ripon College in Wisconsin, 
near where he grew up. After graduating, he 
went to the Massachusetts Institute of Tech- 
nology in Cambridge, where he received his 
master’s and his PhD, before completing a 
postdoc at the University of Oxford, UK. 

When he returned to the United States 
in 1955, he took a faculty position at the 
University of California, Berkeley. It was 
here that he began to develop his life-long 
interest in supercon- 


ductivity — the aston- “You could 
ishing property that take him data 
some metals have at that looked 
very low temperatures like pigeon 

of allowing current to droppings 
flow through them and leave with 


with no resistance. 

Tinkham recog- 
nized that many properties of solids might 
usefully be studied spectroscopically using 
far-infrared radiation — until this point, its 
use had been confounded by poor sources 
and detectors. Specifically, he suspected that 
changes in the absorption of this radiation 
would occur when solids become magnetic 
or superconductive. 

In the same way that electrons in atoms 
have energy levels, so do conventional met- 
als. However, for metals, the distribution of 
these energy levels is essentially continuous. 
A characteristic property of a supercon- 
ductor is that an energy gap forms in this 
continuous distribution. We now know 
that this gap results from electrons in the 
metal binding into pairs — a phenomenon 
fundamental to superconductivity — but 


flakes of gold.” 


in the mid-1950s, there wasn’t even direct 
evidence that such a gap existed. 

In 1956, Tinkham and fellow postdoc 
Rolfe Glover found the first direct evidence 
for this energy gap in the form ofa sharp rise 
in the absorption spectrum of a supercon- 
ductor. They also noted that aspects of the 
absorption data were counter-intuitive. For 
instance, the amount of radiation absorbed 
didn’t just steadily rise as Tinkham and his 
group increased the energy of the radiation 
beyond that needed for any absorption to 
occur. It rose above the level of absorption 
one might expect for a non-superconducting 
metal before decreasing to the expected value. 
Tinkham loved to tell the story of how, when 
he mentioned these peculiar observations to 
John Bardeen, who was already working with 
Leon Cooper and Robert Schrieffer on what 
turned out to be the correct theory of super- 
conductivity (the BCS theory, proposed in 
1957), Bardeen simply commented that such 
behaviour was “not unexpected”. 

Bardeen was right: these observations were 
a direct consequence of the celebrated ‘coher- 
ence factors’ of the BCS theory. Bardeen and 
his colleagues soon established that when 
superconducting electron pairs are broken 
apart (for example, by radiation), instead of 
producing two separate electrons, a combi- 
nation of electrons and ‘holes’ results — a 


through superconductors, provided the first 
substantive experimental confirmation of the 
BCS theory. Bardeen, Cooper and Schrief- 
fer went on to win the 1972 Nobel Prize in 
Physics. 


REAL-WORLD EFFECTS 

This spectacular role in the early history of the 
BCS theory behind him, Tinkham continued 
to work with far-infrared spectroscopy but 
also began to study the macroscopic quan- 
tum behaviour of superconductors. Quantum 
mechanics is normally thought of as impor- 
tant only in the microscopic world of atoms, 
but in superconductors it manifests itself in 
very large objects, such as in the supercon- 
ducting magnets used in magnetic resonance 
imaging. After Tinkham took up a professor- 
ship in 1966 at Harvard University in Cam- 
bridge, Massachusetts, one question emerged 
that remained of central interest to him: what 
is the nature and origin of resistance in a 
superconductor? Or put more simply, when is 
a superconductor really a superconductor? 

As it turns out, when superconductors 
are carrying a current, they don't stay ina 
fixed macroscopic quantum state, but cas- 
cade down from one energy level to another. 
As energy is lost with each transition, this 
is equivalent to saying that superconduc- 
tors have resistance, although it is extremely 
small under most conditions. In the latter 
stages of his career, Tinkham was examining 
the conditions under which these transitions 
happen and how they happen in very thin 
wires of a superconductor. 

Despite all these achievements, being elected 
to the US National Academy of Sciences and 
winning the prestigious Oliver E. Buckley 
prize of the American Physical Society, 
Mike was a modest man with an excep- 
tional sense of humour. The same legions 
of students who witnessed his alchemy with 
data will remember how they first knocked 
nervously on his door to be greeted by a 
somewhat gruff “Come in’, only to learn that 
he was a very warm and witty mentor. 

I was privileged to be in Mike's group at 
Harvard from 1968 to 1974. His breadth of 
skills — as part theorist, part experimental- 
ist — made him seem to me the archetype of 
a complete physicist. m 


‘hole’ being the conceptual and mathemati- 
cal absence of an electron. This observation, 
along with other unusual phenomena, meas- 
ured for instance by passing sound waves 
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Genomic hourglass 


Comparative genomics studies reveal molecular signatures of the controversial ‘phylotypic’ stage — a time when embryos 
of members of an animal phylum all look more alike than at other embryonic stages. SEE LETTERS P.811 & P.815 


BENJAMIN PRUD’HOMME & NICOLAS GOMPEL 


ost people would say that lizards and 
Meezzsts bear little resemblance to 

each other. But not so the embryolo- 
gist, for, at a particular stage in development, 
the embryos of very different species may look 
much the same. Elsewhere in this issue, papers 
by Kalinka et al.'and Domazet-LoSo and Tautz” 
offer a fresh perspective on this intriguing 
phenomenon. 

This is a topic with a long history. In 1828, 
the German biologist Karl von Baer, one of 
the fathers of embryology, reported how very 
similar the early embryos of different species 
can be®: “I have two small embryos preserved 
in alcohol, that I forgot to label. At present Iam 
unable to determine the genus to which they 
belong. They may be lizards, small birds, or 
even mammals.” In fact, it was later observed 
that, over the course of development, the 
youngest embryos within an animal phylum 
often look very different, but progressively 
converge towards a similar form (described 
by von Baer and later dubbed the phylotypic 
stage), before they diverge again to achieve the 
tremendous diversity of adult forms. 

This pattern of morphological divergence 
among species during embryonic develop- 
ment resembles an hourglass*”. Its waist 
marks the phylotypic period during which 
the basic body plan of a given animal group 
is laid down. The existence and meaning of 
the hourglass model, however, have been the 
subject of heated controversy, in part because 
the model rests on subjective comparisons of 
animal likeness of shape®*. The contribution of 
Kalinka et al.'and Domazet-Lo8o and Tautz’ is 
to report molecular signatures supporting the 
existence of the phylotypic stage in insects and 
vertebrates. 

To test the hourglass model, Kalinka et al.' 
(page 811) reasoned that, because the devel- 
opment of shape is directed by the expression 
of genes, variations in morphological pat- 
tern among species might be reflected in the 
dynamic of gene expressions. The authors set 
out to test this idea by measuring differences in 
gene expression between various species of the 
fruitfly Drosophila. Using DNA microarrays, a 
technology that measures genome-wide gene 
expression, they first quantified levels of gene 


Gene-expression divergence 
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Figure 1 | The developmental hourglass, as revealed by comparative genomics. Mid-embryogenesis is 
marked by the phylotypic stage, a period of minimal anatomical divergence between species, as illustrated 
for vertebrate species by the orange band. This stage is now shown by Kalinka et al.’ to display minimal 
gene-expression divergence between Drosophila species (left curve), and by Domazet-Logo and Tautz” 

to express the oldest gene set of the entire life cycle (right curve). The species depicted, left to right, are 
zebrafish, chick and mouse. (Images reproduced from refs 12-14.) 


expression throughout embryogenesis for six 
distinct species of Drosophila. Next, they com- 
pared the temporal expression profiles of all 
the genes across the six species. 

A sophisticated statistical analysis of the data 
set revealed a pattern strikingly similar to the 
anatomical hourglass, in which the temporal 
gene-expression divergence among species 
is minimal around the ‘extended germband’ 
stage, which is classically regarded as the 
phylotypic stage in insects’® (Fig. 1). That is, 
the expression of genes that are active during 
the extended germband stage is evolutionar- 
ily more stable than that of genes active earlier 
and later during development. Remarkably, 
the genes that mostly conform to an hourglass 
pattern are those involved in developmental 
processes, whereas genes involved in non- 
developmental functions show more variable 
expression profiles across species. 

The detection of a phylotypic stage at the 
gene-expression level revives some long- 
standing considerations on the relationships 
between embryonic development (ontogeny) 
and evolution (phylogeny)'’. Ontogeny clearly 
does not recapitulate phylogeny, yet these two 
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processes have intricate connections. It is 
precisely the nature of these connections that 
Domazet-Lo%o and Tautz’ (page 815) have 
explored. 

Starting from the notion that developmental 
novelties might be enabled by the evolution of 
new genes, these authors sought to correlate 
the emergence of new genes (or gene families) 
with novelties in the anatomical development 
of a species — zebrafish, in their case. They 
used ‘phylostratigraphy, an approach they had 
developed previously, to parse the genome into 
classes of genes according to their evolutionary 
origin in the history of life (‘phylostrata’). For 
instance, the zebrafish genome includes genes 
that date back to the origin of cells, others that 
date to the evolution of animals and yet others 
that date to the evolution of vertebrates. Then, 
using DNA microarrays, the authors measured 
the relative contribution of each phylostratum 
to global gene expression (the ‘transcriptome’) 
at different time points in the zebrafish life 
cycle, thereby estimating the relative age of the 
transcriptome at each time point. 

It turns out that genes of different evolu- 
tionary origins are expressed at different time 


points (Fig. 1). Strikingly, the stage classically 
viewed as the phylotypic stage in zebrafish is 
marked by the expression of the evolutionarily 
oldest transcriptome set, whereas earlier and 
later stages (including adult stages) express 
comparatively younger transcriptomes. 
Importantly, the authors identified a similar 
pattern in published microarray data for other 
organisms (fruitfly, mosquito and nematode), 
suggesting that their findings are generally 
applicable. 

By revisiting the subjective anatomical com- 
parisons of classical embryology using quan- 
titative genomics, these two studies!” have 
revived the concept of the phylotypic stage 
with much-needed objectivity. Although they 
take very different approaches, it is remark- 
able that both studies identify genomics sig- 
natures of the phylotypic stage — in short, the 
phylotypic stage sees expression of the oldest 
gene set, which is maximally conserved across 
species. These results reinforce the notion that 
animal body plans emerged using novel sig- 
nalling and regulatory genes that arose at the 
inception of multicellular animal life, and that, 
once established, the gene-expression patterns 
underlying the specification of the different 
body plans have remained fairly invariant. 

This newly acquired molecular legitimacy 
does not, however, explain what establishes 
and maintains the hourglass pattern. Kalinka 
et al.' found that the hourglass pattern of gene- 
expression variation is best explained by the 
action of natural selection. This echoes the 
proposition that mechanistic constraints per- 
taining to the building of a shared body plan 
might explain the conservation observed at the 
phylotypic stage*”. 

A body plan is a particular organization of 
anatomical rudiments. The early embryonic 
specification of these rudiments, independently 
of one another, might take different evolution- 
ary roads. But the assembly of these elements 
into a functional body plan might require a 
tight and constrained orchestration of gene 
expression, reflected in the hourglass waist. 
Once coherently assembled, the connected 
elements makea stable evolutionary substrate 
for an organism to explore new morphogenetic 
directions within the realm of the established 
body plan. 

With this work’’, new avenues open up in 
addressing a long-standing debate. Future 
comparative studies of the gene-regulatory 
networks and developmental events under- 
lying the phylotypic stage will certainly shed 
light on the raison détre of this peculiar 
embryonic period. = 
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Hot entanglement 


Quantum entanglement has been observed at low temperatures in both 
microscopic and macroscopic systems. It now seems that the effect can also 
occur at high temperatures if the systems are not in thermal equilibrium. 


VLATKO VEDRAL 


uantum physics is usually thought 
to apply to small systems at low tem- 
peratures. A standard example would 
be theQuantum dynamics of an electron ina 
hydrogen atom. Atomic orbits of electrons are 
roughly an angstrém in size — that is, compa- 
rable with electronic de Broglie wavelengths, 
which characterize the extent over which 
electrons display a quantum wave-like behav- 
iour. More importantly, at low temperatures, 
the typical energies characterizing electronic 
jumps are hundreds of times larger than the 
thermal energy of the environment to which 
the system is exposed. This, in turn, means that 
the noise due to the environmental tempera- 
ture is negligible compared with the typical 
electronic-jump energies, and therefore that 
the noise does not spoil the system’s quantum 
behaviour. Writing in Physical Review Letters, 
Galve et al.’ show that, contrary to the com- 
mon view, a macroscopic system at high tem- 
peratures can also sustain quantum features. 
It is interesting that similar considerations 
about the restriction of quantum phenom- 
ena to small systems at low temperatures 
can be made about the most quantum of all 
quantum effects: quantum entanglement. 
The term entanglement was coined by Erwin 
Schrédinger, who described it as “the charac- 
teristic trait of quantum mechanics” It refers 
to a state of two or more quantum systems in 
which the systems are so intertwined that they 
behave like one — it is actually a mistake to 
think of the subsystems separately. Quantum 
systems become entangled when they interact 
with one another. In the past decade, exten- 
sive theoretical and experimental research’ has 
shown that, no matter what systems we look 
at, a general rule says that if the interaction 
strength between the subsystems is larger 


than the thermal energy due to their coupling 
to the environment, entanglement should 
exist between these subsystems provided 
that they are in thermal equilibrium with the 
environment. 

Now Galve et al.’ prove that this relationship 
between temperature and entanglement is not 
valid for systems that are not in thermal equi- 
librium. Here, in fact, the news is very good 
for entanglement. The authors predict that 
nanomechanical oscillators can be entangled 
at much higher temperatures than previously 
thought possible. 

The basic intuition behind this result is 
as follows. When a system is not in thermal 
equilibrium, the temperature no longer pro- 
vides the relevant energy scale against which 
to compare the system’s quantum behaviour. 
What matters instead is an effective tempera- 
ture, which can be much lower than the abso- 
lute one. This effective temperature is obtained 
by multiplying the absolute temperature by the 
rate at which the system approaches equilib- 
rium divided by the driving frequency, the 
frequency of the signal with which the sys- 
tem is made to oscillate. Galve and colleagues 
demonstrate that this new condition for entan- 
glement — that the interaction between sub- 
systems should be compared with the thermal 
energy at the effective temperature — holds 
quite generally and is intuitively pleasing. It 
says that if we can drive the system to oscil- 
late within a shorter timescale than the time 
it takes to reach thermal equilibrium, then an 
entangled steady state can be attained at higher 
temperatures than the absolute one. 

The actual system that Galve et al. investi- 
gate — two macroscopic (harmonic) oscillators 
coupled to each other — is important because a 
number of laboratories are currently working 
with similar systems. For instance, Aspelmeyer 
and colleagues’ have created quantum states 
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in a movable nanomechanical mirror that is a 
microgram in weight. The high-temperature 
entanglement envisaged by Galve et al. could 
be achieved by coupling two such mirrors to 
one another. I and colleagues* have shown, 
using a different theoretical approach to that 
of the present study’, that such nanomechani- 
cal entanglement should persist at tempera- 
tures of about 20 kelvin. The hope now is that, 
by using Galve and colleagues’ new ideas, the 
temperature can be pushed upwards to, say, 
100 kelvin. This would eliminate the current 
need for expensive and elaborate cryogenics 
to cool the oscillators. 

So, OK, we can in principle entangle nano- 
mechanical oscillators at high temperatures. 
Physicists will no doubt get excited because 
this realization will strengthen the evidence 
for the universality of quantum mechanics. 
But why should anybody else care? 

The most exciting macroscopic and ‘hot 
non-equilibrium systems we know are, of 
course, the living ones. We can, in fact, view 
any living system as a Maxwell’s demon, main- 
taining life by keeping its entropy low against 
the environmental noise — that is, by being 


far from equilibrium. The father of thermo- 
dynamics, Ludwig Boltzmann, himself viewed 
living systems in this way. Here is what he said 
on the matter: “The general struggle for exist- 
ence of living beings is therefore not a fight for 
energy, which is plentiful in the form of heat, 
unfortunately untransformable, in every body. 
Rather, itis a struggle for entropy that becomes 
available through the flow of energy from the 
hot Sun to the cold Earth. To make the full- 
est use of this energy, the plants spread out the 
immeasurable areas of their leaves and harness 
the Sun's energy bya process as yet unexplored, 
before it sinks down to the temperature level of 
our Earth, to drive chemical syntheses of which 
one has no inkling as yet in our laboratories.” 
We have actually learnt a little bit about that 
“unexplored” process — photosynthesis — 
since Boltzmann. And as it happens, recent 
experiments’ show a quantum effect leading 
to entanglement® in some photosynthetic 
complexes. Such entanglement might yield 
an increased efficiency in the transfer and 
processing of energy in photosynthesis. The 
overall mystery of photosynthesis remains, but 
there is now evidence that quantum physics has 


Tumour stem cells 


switch sides 


Tumour stem cells are proposed to be the source of tumour cells. It now emerges 
that they also give rise to the endothelial cells that line the tumour vasculature, 
mediating tumour growth and metastasis. SEE LETTERS P.824 & p.829 


VICTORIA L. BAUTCH 


o grow, solid tumours need a blood 

supply. They recruit new blood ves- 

sels mainly by inducing the sprouting 
of endothelial cells from external vessels and 
promoting the cells’ migration into the tumour. 
This ability, called the angiogenic switch, is 
required for tumour cells to invade surround- 
ing tissue and metastasize to distant sites — 
the deadly hallmarks of cancer’. In this issue, 
Wang et al and Ricci-Vitiani et al.° show that, 
in addition to recruiting vessels from outside, 
brain tumours produce endothelial cells for 
vessel formation from within. 

Recent research in tumour biology has 
focused on two main concepts. According 
to the first concept — vasculogenic mimicry 
— some tumour cells take on certain charac- 
teristics of vascular endothelial cells and line 
the tumour’s blood vessels’. The origin of such 
tumour cells is ill-defined: whereas one study” 
suggested that tumour stem cells show vascu- 
logenic mimicry, it is generally thought that 


tumour cells in the immediate environment 
of the nascent vessel are co-opted for the pur- 
pose. The co-opted cells are thought to retain 
most of their tumour-cell characteristics while 
acquiring a limited number of endothelial-cell 
features. 

The second concept — that some tumours 
originate from a tumour stem cell — has been 
controversial. According to this idea, tumour 
stem cells are both refractory to most tradi- 
tional therapies and capable of regenerating 
the tumour following treatment. The deadly 
brain tumour glioblastoma is thought to arise 
from tumour stem cells®. 

Wang et al.’ (page 829) and Ricci-Vitiani 
et al.’ (page 824) now reveal data that are rel- 
evant to both concepts, and provide strong 
evidence that a proportion of the endothelial 
cells that contribute to blood vessels in glioblas- 
toma originate from the tumour itself, having 
differentiated from tumour stem-like cells. 

Both groups note that a subset of endothe- 
lial cells lining tumour vessels carry genetic 
abnormalities found in the tumour cells 
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something to do with it in a profound way. And 
there are other instances in biology in which 
quantum entanglement could be important’. 
If this is a general trend in the biological world 
(and it is a big ‘if’), maybe Boltzmann was only 
half right: could it be that life does not just keep 
its entropy low, but rather, also aims to keep 
its quantum entanglement high if and when 
needed for an increased efficiency of energy 
transport? For now, the jury is still out. m 


Vlatko Vedral is at the Clarendon Laboratory, 
University of Oxford, Oxford OX1 3PU, UK, 
and the Centre for Quantum Technologies, 
National University of Singapore, 117543 
Singapore. 

e-mail: vlatko.vedral@qubit.org 


1. Galve, F. et al. Phys. Rev. Lett. 105, 180501 (2010). 

2. Amico, L., Fazio, R., Osterloh, A. & Vedral, V. Rev. 
Mod. Phys. 80, 517-576 (2008). 

3. Grdblacher, S., Hammerer, K., Vanner, M. R. & 
Aspelmeyer, M. Nature 460, 724-727 (2009). 

4. Vitali, D. et al. Phys. Rev. Lett. 98, 030405 (2007). 

5. Collini, E. et a/. Nature 463, 644-647 (2010). 

6. Sarovar, M., Ishizaki, A., Fleming, G. R. & 
Whaley, K. B. Nature Phys. 6, 462-467 (2010). 

7. Arndt, M., Juffmann, T. & Vedral, V. HFSP J. 3, 
386-400 (2009). 


themselves (Fig. 1). For instance, a compara- 
ble proportion of a cell population expressing 
endothelial-cell markers and a population of 
neighbouring tumour cells harboured three or 
more copies of either the EGFR gene or other 
parts of chromosome 7. Such cell populations 
also shared a mutated version of the oncogene 
p53. Another indicator of the tumour origin of 
some tumour-vessel endothelial cells is that, as 
well as expressing characteristic endothelial- 
cell markers — such as von Willebrand factor 
and VE-cadherin — they expressed the non- 
endothelial, tumour marker GFAP. 

The researchers also present evidence that 
tumour-derived endothelial cells arise from 
tumour stem-like cells. They find that a glio- 
blastoma cell population that could differ- 
entiate into endothelial cells and form blood 
vessels in vitro was enriched in cells expressing 
the tumour-stem-cell marker CD133. More- 
over, Wang and colleagues show that a clone of 
cells derived from a single tumour cell, which 
expressed CD 133 but not VE-cadherin, was 
multipotent: in vitro, the cells differentiated 
into both neural cells (which eventually form 
tumour cells) and endothelial cells. 

On being grafted into mice, these cells 
formed highly vascularized tumours. More- 
over, even the progenitor cells from these 
tumours continued to form tumours and 
tumour-derived endothelial cells, suggest- 
ing that the multipotential characteristic had 
been maintained. Ricci-Vitiani et al. gained 
further insights by generating undifferentiated 
cell aggregates from human tumour-derived 
CD133-expressing cells and grafting them 
into mice. The internal vessels of the resulting 
tumours expressed human vascular markers, 


whereas more external vessels carried mouse- 
specific endothelial-cell markers. What's more, 
the authors found human endothelial cells in 
tumour vessels linking to the mouse vessels 
and delivering blood to the tumour. 

Wang et al.’ suggest that the differentia- 
tion of tumour stem-like cells into endothelial 
cells might be mediated by signalling pathways 
involving two proteins — vascular endothelial 
growth factor (VEGF) and Notch. The authors 
propose that Notch regulates the initial differ- 
entiation of tumour stem-like cells to endothe- 
lial progenitor cells, whereas VEGF selectively 
affects the differentiation of endothelial 
progenitors to tumour-derived endothelial 
cells (Fig. 1). 

Another team’ has also investigated the 
source of cells contributing to tumour vessels, 
and has shown that tumour stem-like cells 
cultured from human glioma tumours form 
endothelial cells in vitro. The authors detected 
channels lined with tumour-derived cells in 
mice transplanted with human tumours — a 
process they classify as vasculogenic mimicry. 
However, their analysis of the original human 
tumours was limited to marker expression, and 
so they could draw no firm conclusion about 
the relationship between the tumour cells and 
the endothelial cells. Similarly, other groups*” 
have presented evidence of genetic abnormali- 
ties common to tumour cells and endothe- 
lial cells, but their data did not distinguish 
among several potential mechanisms for the 
observations. 

What is the functional significance of a 
tumour origin for vascular endothelium? To 
address this question, Ricci-Vitiani et al? gen- 
erated tumours in which the tumour-derived 
vessels were susceptible to drug-mediated 
destruction. Following drug treatment, these 
tumours were smaller than control tumours 
and had fewer blood vessels. This indicates that 
blood vessels derived from tumours are crucial 
for tumour survival. 

The new work”” also defines the relationship 
between a tumour and the blood vessels with 
which it interacts. Ifa dedicated compartment 
of some tumours provides a niche for stem cells 
that can give rise to functional blood vessels, 
there may be a less urgent need for tumour cells 
to undergo the angiogenic switch to recruit 
vessels, and stronger selective pressure on them 
to differentiate into endothelial cells. 

Moreover, these observations challenge the 
assumption that tumour endothelial cells are 
normal cells, and therefore lack the genetic 
instability that may be the basis of drug resist- 
ance in tumour cells. Consistent with this sug- 
gestion, earlier studies'””' showed that tumour 
endothelial cells over-duplicate centrosomes 
— cellular organelles involved in cell division 
— and possess elevated levels of chromosome 
abnormalities. Moreover, there seems to bea 
link between increased activity of the signalling 
cascades that promote blood-vessel formation 
and chromosome abnormalities in endothelial 
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Figure 1 | Tumour stem-like cells are multipotent. Tumour-derived stem cells are thought to give rise 
to tumour cells. Wang et al.” and Ricci-Vitiani et al.’ propose that a portion of the vascular endothelium 
that lines the tumour vessels in glioblastoma also arises from tumour stem-like cells. They show that the 
genetic abnormalities (dots) seen in the tumour cells are also present in endothelial cells isolated from 
the tumours. It seems that tumour stem-like cell differentiation to endothelial-cell progenitors occurs 
through Notch-mediated signalling, and that further differentiation of endothelial-cell progenitors 

into endothelial cells is mediated by the VEGF signalling pathway. 


cells’*. Tumour cells may therefore promote 
genetic instability in tumour endothelial cells 
through two distinct mechanisms: by giving 
rise to them directly, or by sending a signal to 
a nearby endothelial cell. Thus, not only the 
tumour compartment, but also genetically 
unstable tumour endothelial cells, may con- 
tribute to drug resistance. 

Several compelling questions arise from the 
latest data””. First, how general is the differenti- 
ation of tumour stem-like cells into endothelial 
cells? Both studies focused on glioblastomas, 
and so the relevance of this pathway in other 
tumours of suspected stem-cell origin must 
also be determined. Other cell types of the 
underlying support tissue (stroma), such as 
fibroblasts, also play a part in tumour forma- 
tion and progression. Do tumour stem cells 
contribute to these non-endothelial stromal 
lineages, and, if so, under what conditions? 

It is also necessary to define the condi- 
tions that promote the differentiation of 
tumour stem-like cells to endothelial cells, 
and to determine the prevalence of this proc- 
ess within a given tumour environment. For 
example, does local shortage of oxygen trigger 
this differentiation? The present studies exam- 
ine the molecular pathways that regulate the 
formation of tumour-derived endothelium at a 
superficial level. Defining the relevant mecha- 
nisms thoroughly is an essential prelude to the 
design of new therapies. 

Finally, it will be crucial to determine how 
tumour-derived endothelial cells and vessels 
differ from their non-tumour counterparts in 
both morphology and function. Other studies”? 


have reported that, when cultured, endothe- 
lial cells isolated from tumours exhibit some 
properties of stem cells, with the assumption 
that these properties were acquired by signals 
from the tumour environment. In light of the 
present work, an intriguing alternative possi- 
bility is that endothelium derived from tumour 
stem-like cells contributes to the observed cell 
characteristics. This work** therefore high- 
lights yet another of the numerous ways in 
which tumours evade destruction: by contrib- 
uting to their own support system. m 
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90 YEARS AGO 


The July issue of Man contains 
several articles of general interest 
... A.D. Lacaille illustrates a 
number of very large British 
Acheulean coups de poing, and a 
puzzling rock-carving from the Val 
Camonica is discussed by Dr. Anati 
of Paris. The site is near where 

the great glacial valley debouches 
on to the north Italian plain, and 
many rock-carvings there have 
been known for a long time. They 
include animals and humans 
treated in a conventional manner 
somewhat recalling the Copper Age 
paintings of Las Figuras in south- 
west Spain. The little group in 
question seems to indicate either a 
phallic or a ritual scene. The author 
suggests a date for this art group 
somewhere towards the start of 

the first millennium B.c. Is not this 
somewhat too early? 

From Nature 10 December 1960. 


100 YEARS AGO 


The Anatomy of the Honey Bee. 
ByR. E. Snodgrass. — In this 
modest pamphlet the author has 
given to entomologists an original, 
trustworthy, and excellently 
illustrated account of the structure 
of the honey bee ... Many volumes 
have been written on the honey 
bee, yet no surprise can be felt that 
Mr. Snodgrass has been able to 
add new points to our knowledge 
and to correct errors in the work of 
his predecessors ... He expresses 
scepticism as to certain positive 
statements that have been made on 
controverted details of physiology 
and reproduction; for example, 
“concerning the origin of the royal 
jelly or of any of the larval food 
paste ... we do not know anything 
about it.” There is a present-day 
tendency unduly to disparage the 
results obtained by former workers, 
and sucha statement will strike 
many readers as extreme. 

From Nature 8 December 1910. 


COSMOLOGY 


Hydrogen was not 
ionized abruptly 


When and how the first stars and galaxies ionized the primordial hydrogen atoms 
that filled the early Universe is not known. Observations with a single radio 
antenna are opening a new window on the process. SEE LETTER P.796 


JONATHAN PRITCHARD & ABRAHAM LOEB 


Big Bang, the Universe had cooled 

sufficiently for hydrogen atoms to form. 
Hundreds of millions of years later, the first 
stars and galaxies had produced ionizing ultra- 
violet radiation that broke the hydrogen atoms 
into their constituent electrons and protons. 
This process, termed reionization, marks a 
major cosmological phase transition. When 
and how rapid this transition was are impor- 
tant open questions’. On page 796 of this issue, 
Bowman and Rogers’ implement a new tech- 
nique that allows them to rule out models in 
which reionization occurs abruptly. 

Their approach uses a simple radio antenna 
operating at low frequencies to measure the 
absolute radio intensity of the sky. Cosmic 
hydrogen atoms can emit or absorb light with 
a wavelength of 21 centimetres, a signal that 
is stretched (redshifted) on its way to Earth 
through the expansion of the Universe’. The 


f our hundred thousand years after the 


redshifted 21-cm hydrogen signal, which 
falls within the radio regime, is expected to 
cut off at short, observed wavelengths that 
correspond to later times when the Universe 
was ionized. The authors’ experiment to 
detect the global reionization step (EDGES) 
searches for the associated spectral step in the 
sky’s intensity’. 

Our knowledge of the epoch of reionization 
is surprisingly limited. The lack of ultraviolet 
(UV) absorption by diffuse neutral hydrogen 
along the line of sight to the most distant qua- 
sars° (accreting black holes) indicates that the 
Universe is largely ionized at a redshift of less 
than about 6 — a billion years after the Big 
Bang. Yet observations of the cosmic micro- 
wave background’ — radiation left over from 
the Big Bang — indicate that the Universe was 
filled with neutral hydrogen at much earlier 
times. Clearly, a transition must have occurred 
from a neutral to an ionized Universe, but even 
recent observations of high-redshift galaxies 
with the Hubble Space Telescope tell us little 
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Figure 1 | The 21-centimetre cosmic hydrogen signal. a, Time evolution of fluctuations in the 

21-cm brightness from just before the first stars formed through to the end of the reionization epoch. 
This evolution is pieced together from redshift slices through a simulated cosmic volume’. Coloration 
indicates the strength of the 21-cm brightness as it evolves through two absorption phases (purple 

and blue), separated by a period (black) where the excitation temperature of the 21-cm hydrogen 
transition decouples from the temperature of the hydrogen gas, before it transitions to emission (red) 
and finally disappears (black) owing to the ionization of the hydrogen gas. b, Expected evolution of the 
sky-averaged 21-cm brightness* from the ‘dark ages’ at redshift 200 to the end of reionization, sometime 
before redshift 6. The frequency structure within this redshift range is driven by several physical processes, 
including the formation of the first galaxies and the heating and ionization of the hydrogen gas. There is 
considerable uncertainty in the exact form of this signal, arising from the poorly understood properties 
of the first galaxies. Bowman and Rogers’ study the final phase, in which the progressive ionization of the 


gas cuts off the signal. 
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about the galaxies that must have driven reion- 
ization’. 

Two major challenges for detecting the 
21-cm signal involve foregrounds and cali- 
bration. The cosmic signal is dwarfed by radio 
emission from the Milky Way, as well as by 
terrestrial radio emission. Despite the favour- 
able location of the EDGES experiment in the 
Australian outback, transmission from local 
radio and TV stations causes the loss of isolated 
regions of the spectrum. In addition, Galac- 
tic radio emission from energetic electrons 
spiralling in magnetic fields forms a spectrally 
smooth foreground that is one-thousand times 
brighter than the 21-cm signal. This smooth 
Galactic foreground can be fitted with a 
simple polynomial, and so removed, leaving 
the cosmic signal in the residuals. Unfortu- 
nately, this procedure removes much of the 
signal, potentially throwing the baby out with 
the bath water. 

Another important limitation of the current 
experimental set-up is the absence of a method 
for calibrating the frequency response of the 
radio antenna. This necessitates fitting a com- 
bination of the foregrounds and the antenna’s 
response. Given these limitations, it is impres- 
sive that the authors” are able to achieve 
residuals at the level of tens of millikelvin, com- 
parable to the expected signal, and to place weak 
constraints on the duration of reionization. 

Bowman and Rogers’ technique allows them 
to rule out only models in which reionization 
occurs most abruptly — corresponding to a 
redshift interval of less than 0.1. As yet, the 
technique has had little effect on most models 
of the reionization epoch and the first galaxies. 
Figure 1 shows the expected evolution of the 
Universe as traced by emission or absorption 
of the 21-cm spectral line®. There is an initial 
absorption regime where the hydrogen gas is 
cooling through its cosmic expansion, and the 
excitation temperature of the 21-cm transition, 
which characterizes the relative populations of 
its two energy levels, is held equal to the gas 
temperature by collisions between hydrogen 
atoms. This absorption dies away as the gas gets 
diluted. Then the first stars form and emit UV 
photons that again set the excitation tempera- 
ture of the 21-cm transition equal to the gas 
temperature, reinvigorating a second absorp- 
tion trough. As these stars die, some of them 
produce black holes whose X-ray emission 
is expected to heat the gas to above the tem- 
perature of the cosmic microwave background, 
pushing the 21-cm signal into emission. 

The authors’ focus their efforts on this final 
phase, in which the signal is seen in emission 
and the progressive ionization of the diffuse 
hydrogen gas cuts off the signal, indicating the 
end of reionization. The same technique could 
ultimately be applied to detecting earlier peri- 
ods for which our picture of the astrophysics 
is highly uncertain. 

In the meantime, considerable time and 
money is being dedicated to the construction 


of low-frequency radio interferometers such 
as MWA, LOFAR and PAPER, which will 
target spatial fluctuations in the 21-cm signal 
(Fig. 1a). The EDGES experiment represents 
acheaper method for measuring only the sky- 
averaged, broad-brush features in the evolution 
of the signal. Despite its limitations, it opens 
the possibility of an alternative experimental 
avenue that should be pursued in parallel to 
the more ambitious interferometers. Bow- 
man and Rogers’ have taken the first step on 
this journey, which will hopefully lead to new 
insights about the first stars and galaxies and 
the reionization epoch. = 
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many functions 


The Lassa virus nucleoprotein coats the viral genome to make a template for RNA 
synthesis. A study shows that it also binds the ‘cap’ structure of cellular messenger 
RNAs and directs immune evasion using a novel mechanism. SEE ARTICLE P.779 


FELIX A. REY 


rhagic disease caused by the Lassa virus, 

a member of the Arenaviridae family’. 
The disease is prevalent in West Africa, causing 
5,000 deaths each year and infecting hundreds 
of thousands more’. Arenaviruses are distrib- 
uted worldwide and cause persistent infection 
in rodents, in which they generally don't cause 
disease. Humans become infected by exposure 
to material contaminated by infected mice, 
for example when the animals infiltrate food 
stores. The Lassa virus genome is a negative- 
sense, single-stranded RNA (nsRNA) mol- 
ecule, and is coated by a nucleoprotein to form 
a nucleocapsid — a complex in which multiple 
copies of the nucleoprotein wrap around the 
genomic RNA, each one contacting a fixed 
number of nucleotides. In this issue (page 779), 
Qi et al.* report the crystal structure of the 
Lassa virus nucleoprotein, and reveal that it 
has a striking array of activities. 

The nucleocapsids of nsRNA viruses serve 
as templates for the virus’ polymerase enzyme 
(also known as the large or L protein), which 
replicates the genome to make new infectious 
particles. Qi and colleagues’ crystal structure’ 
shows that the Lassa virus nucleoprotein is 
made of two domains — an amino-terminal 
domain and a carboxy-terminal domain — 
with a positively charged groove in between, 


| assa fever is a dreadful human haemor- 


where the genomic RNA is expected to bind. 
This organization has been observed in all 
nsRNA viruses for which the nucleoprotein 
structure is known. 

Before replication, the polymerase tran- 
scribes the genome into messenger RNA mol- 
ecules to be translated into the viral proteins. 
Efficient translation of mRNAs by cellular 
ribosomes occurs if the mRNAs have a ‘cap’ 
structure at the 5’ end of the molecule. But are- 
naviruses, along with a subset of nsRNA viruses 
(those that have segmented genomes; Fig. 1, 
overleaf), cannot themselves cap mRNAs. 
They therefore steal caps from cellular mRNAs 
and transfer them to nascent viral transcripts, 
in a process known as cap snatching. Arenavi- 
ruses do this by cleaving off the 5’ end of cel- 
lular mRNAs using an ‘endonuclease’ activity 
that resides in the amino-terminal domain 
of the L protein*”, and then transferring the 
mRNA fragment to nascent transcripts. 

Qi and colleagues’ structure of the Lassa 
virus nucleoprotein shows that its amino- 
terminal domain has a cap-binding site, which 
holds the 5’ end of cellular mRNAs in place 
while the L protein cleaves off the rest. This 
additional function of the arenavirus nucleo- 
protein has not been observed in counterparts 
of the protein from any other virus family. The 
authors’ show that when key residues in the 
cap-binding site are mutated, transcription is 
impaired. 
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Furthermore, the structure shows 
that the carboxy-terminal domain 
of the Lassa virus’s nucleoprotein 
is folded in the same way as cellular 
3'-5’ exonucleases — the enzymes 
that remove nucleotides one at a time 
from the 3’ end of RNA or DNA mol- 
ecules, often completely degrading the 
nucleic-acid molecules in the process. 
Indeed, one of the closest structural 
homologues of the nucleoprotein’s 
carboxy-terminal domain is the 
human DNA 3’-5’ exonuclease 
enzyme TREX1. What are the 
implications of this? 

The detection of foreign nucleic 
acids to induce production of type I 
interferon (IFN) proteins is central to 
the innate antiviral defence of cells; 
misregulation of this system causes 
autoimmune problems. TREX] is 
necessary for the degradation of 
single-stranded DNA derived from 
endogenous retroelements® (which 
constitute 90% of the approximately 
three million transposable elements 
in the human genome). Such single- 
stranded DNA accumulates in 
TREX1-deficient cells, inducing the 
IFN response and causing autoim- 
mune disease. Qi et al. identified the 
amino acids of the 3’-5’ exonuclease 
active site of the Lassa virus nucleo- 
protein by superposition of their 
crystal structure’ on that’ of TREX1. 
Remarkably, these amino acids corre- 
spond to residues that were recently 
shown to have a critical role in the 
IFN-counteracting activity of the 
nucleoprotein of the lymphocytic cho- 
riomeningitis virus’, the best-studied 
arenavirus. This suggests that the 
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Figure 1 | Phylogeny of nsRNA viruses. This unrooted 
phylogenetic tree of the known nsRNA viruses (adapted from 
ref. 20) is overlaid with coloured ellipses representing the 
various virus families. The ophioviruses constitute a genus 
rather than a family. Representative human pathogens/diseases 
in each of the families are indicated in parentheses. The diagonal 
dashed line separates the nsRNA families that have a single RNA 
genomic molecule (non-segmented genomes) from those that 
have several genomic segments. The arenaviruses have two 
genomic segments, bunyaviruses have three, ophioviruses three 
or four (depending on the virus) and orthomyxoviruses six to 
eight. Qi et al.’ report the crystal structure of the nucleoprotein 
from the Lassa virus (a member of the Arenaviridae family). 
The structure reveals an unexpected biological function of the 
nucleoprotein, and casts fresh light on the evolutionary history 


genomes and that have been studied in 
detail (Bunyaviridae, Orthomyxoviridae 
and Arenaviridae) share a cap-snatching 
strategy for genome transcription. This 
is not the case in the non-segmented 
viruses, in which the L protein has a 
capping activity. 

The phylogenetic diagram shown in 
Figure 1 is based on the conserved RdRp 
modules from all nsRNA viruses, and 
shows how the different families cluster 
according to whether or not they have 
segmented genomes. Structural data 
show that the nucleoproteins from all 
non-segmented nsRNA viruses have 
evolutionarily conserved folds, but 

this isn’t the case for the segmented 
ones. So, although the nucleopro- 
tein structures of the bunyaviruses”’ 
and the orthomyxoviruses™ both 
contain two domains (an amino- 
terminal and a carboxy-terminal 
domain, as seen for the Lassa virus 
nucleoprotein’), the individual folds 
of the domains are unrelated. By con- 
trast, the amino- and carboxy-terminal 
domains of the nucleoproteins of the 
Bornaviridae’’, the Rhabdoviridae’®”” 
and the Paramyxoviridae’® (all of which 
have non-segmented genomes) have a 
conserved three-dimensional fold, sug- 
gesting a common ancestry, despite the 
absence of any detectable similarity in 
their amino-acid sequences. 

Qi and colleagues’ crystal structure’ 
of an arenavirus nucleoprotein illumi- 
nates the protein’s roles in the virus’s 
cycle, while adding to our under- 
standing of the evolutionary history of 
nsRNA viruses. It also highlights the 
fact that each family of nsRNA viruses 
seems to have developed different 


3’-5' exonuclease activity is the way 
by which arenavirus nucleoproteins 
inhibit IFN induction. 

The authors verified’ that the Lassa virus 
nucleoprotein does indeed degrade short RNA 
molecules similar to those generated as by- 
products during replication and transcription 
of the virus. They also showed that the wild- 
type nucleoprotein inhibits IFN production in 
virus-infected cells, whereas mutants devoid of 
exonuclease activity do not, even though they 
still undergo replication and transcription. 

Why are so many unrelated activities con- 
centrated in a single protein? The answer is 
probably related to the extreme compactness 
of arenavirus genomes, which code for only 
four proteins — fewer than in any of the other 
nsRNA virus families, and fewer than in any 
other human pathogenic virus. Of these four 
proteins, two of them (the nucleoprotein 
and the L protein) are present in all nsRNA 
viruses, and form the replicative founda- 
tion of these viruses. The structure of the 
Lassa virus nucleoprotein thus also provides 


of nsRNA viruses. 


further insight into the evolutionary history 
of nsRNA viruses, as described below. 

The amino-acid sequence of the arenavirus 
L protein has the signature of RNA-depend- 
ent RNA polymerases (RdRps, enzymes 
that catalyse the replication of RNA from an 
RNA template). L proteins are found in all 
nsRNA viruses, with the exception of those 
of the Orthomyxoviridae family, in which the 
polymerase is split into three smaller polypep- 
tides (PA, PB1, PB2) and functions as a heter- 
otrimer containing these three proteins* — PA 
has the cap-snatching endonuclease site’, PB1 
acts as the catalytic RdRp”’ and PB2 has the 
cap-binding site’’. Qi et al.’ have now shown 
that, in arenaviruses, the cap-binding site 
resides in the nucleoprotein. The Bunyaviri- 
dae family of nsRNA viruses, meanwhile, have 
endonuclease activity in the amino-terminal 
domain of the L protein”, but their cap-binding 
site has not yet been identified. Thus, the three 
families of nsRNA viruses that have segmented 
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immune-defence strategies and shows 

that the arenaviruses’ mechanism of 
immune evasion is a novel one. Considering 
that fatal infections by pathogenic arena- 
viruses — and chiefly by the Lassa virus’? — 
are characterized by a generalized immune 
suppression, these new results have major 
implications for finding new ways to combat 
these diseases. m 
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Pluto is again 


a harbinger 


New astronomical and laboratory data show that the abundances of the two 
dominant ices, nitrogen and methane, on the surfaces of the Solar System’s 
two largest dwarf planets are surprisingly similar — raising fresh questions. 


S. ALAN STERN 


ombining state-of-the-art telescopic 

and ground-based laboratory data, 

Tegler et al.’ have recently reported that 
the proportions of nitrogen (N,) and methane 
(CH,), the dominant surface ices on the two 
largest dwarf planets, Pluto and Eris, are sur- 
prisingly similar. More specifically, they found 
that the N, and CH, abundances on Eris are 
near 90% and 10%, respectively, and that those 
on Pluto are 97% and 3%. Intriguingly, these 
abundances are also similar to those on the 
dwarf planet and Kuiper-belt escapee Triton, 
which orbits Neptune. 

Tegler and colleagues’ results, published in 
the Astrophysical Journal, represent the first 
quantitative comparison of the abundances 
of volatile ices on the surface of any bodies 
beyond Neptune. They have significant impli- 
cations for understanding Pluto and Eris, as 


Neptune 


e" 


well as the Kuiper belt, the disk-shaped region 
beyond Neptune's orbit where these two dwarf 
planets and other bodies reside (Fig. 1). The 
findings also provide reassurance that the 
detailed study planned for the Pluto system by 
NASA’s New Horizons mission’, which is now 
en route for a 2015 fly-by, will be of relevance 
to a broader suite of small planets common to 
the outer Solar System. 

The discovery of Pluto by Clyde William 
Tombaugh in 1930 can be considered the 
technical discovery of the Kuiper belt. But the 
Kuiper belt's existence was firmly established 
only in the 1990s, with the discovery of addi- 
tional bodies there™*. Interestingly, a wide vari- 
ety of attributes now known to be common to 
many large Kuiper-belt objects were first iden- 
tified in studies of Pluto*. These include Pluto's 
rocky interior, its icy red surface, the presence 
of its satellites, its high orbital inclination and 
its resonant orbit with Neptune (Pluto's orbital 


Figure 1 | Pluto, Eris and 

the Kuiper belt. Tegler and 
colleagues’ demonstration’ 

that Pluto and Eris have similar 
surface abundances of nitrogen 
and methane ices suggests 

that such abundances may 

be common, or at least not 
uncommon, among large objects 
in the Kuiper belt, the disk- 
shaped region beyond Neptune's 
orbit where the two dwarf planets 
reside. White dots represent 
objects in the classical Kuiper 
belt. Neither Centaurs (Kuiper- 
belt escapees) nor objects in the 
‘scattered belt’ beyond Pluto's 
orbit are shown. Other large 
dwarf planets smaller than Pluto 
and Eris are also not shown. 
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period is in the precise ratio of 3/2 of Neptune's 
orbital period). As such, it is reasonable to refer 
to Pluto as the harbinger of the Kuiper belt and 
many of its key attributes. 

The discoveries of CH, and N, ices on 
Pluto were reported in 1976 and 1992, respec- 
tively°*. By discovering similar abundances 
of N, and CH, ices on Eris and Pluto, Tegler 
et al.' have demonstrated that Pluto’s icy 
surface composition may be common — or at 
least not uncommon — among large Kuiper- 
belt worlds, thereby demonstrating another 
way in which Pluto seems to be a harbinger. 
Furthermore, because both N, and CH, create 
significant atmospheric vapour pressures 
at characteristic Kuiper-belt surface tem- 
peratures’, an important implication of the 
authors’ discovery’ is that tenuous N,-CH, 
atmospheres such as Pluto’s (its atmospheric 
pressure is conceivably a few tens of microbars) 
may also be acommon attribute among planets 
in the Kuiper belt. 

Yet Tegler and colleagues’ findings also 
raise new questions. A pivotal one is why 
some large Kuiper-belt worlds, such as Eris 
and Pluto, display N, and CH, on their sur- 
faces, whereas others — even those that are 
similar to Eris and Pluto in both size and loca- 
tion in the Kuiper belt — display only H,O ice 
on their surface, with no trace® of either N, 
or CH,. A second, related, question concerns 
comparisons between the surface composi- 
tions of Pluto and Eris, and those of comets, 
which themselves derive from, and are thought 
to be the building blocks of, dwarf planets. 
Although the CH, fractions on Pluto and Eris 
are not unlike those seen in some comets’, it 
is puzzling that they display so much N,on 
their surfaces when comets are apparently 
uniformly N, poor’. 

The ongoing rapid advance of ground-based 
astronomical facilities offers hope that, within 
this decade, such questions will be answered. 
The obvious route would be to apply Tegler 
and colleagues’ methods — which involve 
both ground-based infrared spectroscopy 
and laboratory-based spectral studies of ice 
mixtures — to many more Kuiper-belt planets 
and smaller bodies. 

Adding to the likelihood that such ques- 
tions will be resolved in this decade are two 
important space missions now en route to their 
targets. One is the European Space Agency’s 
flagship Rosetta comet orbiter, which will 
make the most detailed and comprehensive 
exploration ever imagined” of a comet (and 
Kuiper-belt escapee). Rosetta will arrive at its 
target, comet 67 P/Churyumov-Gerasimenko, 
in mid-2014. Then, just one year later, NASA‘s 
New Horizons mission’ will reconnoitre Pluto 
and all three of its known moons in exquisite 
detail. 

Of particular relevance for surface-com- 
position studies is the fact that both missions 
carry sensitive infrared mapping spectro- 
meters. These spectrometers will, for the 


first time, reveal the distribution of N,, CH, 
and many other compounds across the sur- 
face of a dwarf planet and search for them 
across a comet. What’s more, they will, by 
dint of the close proximity of their space- 
craft to the respective targets, also be able to 
look into the near-surface interiors of these 
representatives of comets and dwarf planets. 
This will be accomplished by examining the 
surface compositions of subsurface windows 
afforded by craters, fissures and exposed, ver- 
tically bedded layering where it is present on 
these bodies. 

Tegler et al.’ have revealed both composi- 
tional insight into, and commonalities among, 
the two largest planets of the Kuiper belt. It is 
up to future research teams, working with even 
more advanced facilities than those used by 
the authors, to address the questions that this 
discovery has raised, and to determine how 
much more diversity or commonality there is 
in surface composition among the planets of 
the Kuiper belt. m 
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Greenland’s glacial 


basics 


Sliding of the Greenland ice sheet is affected by the production of surface meltwater. 
Anew theory shows that whether the result is a long-term speed-up or slow-down 
of ice motion depends on the variability in melt input. SEE LETTER P.803 


MARTIN P. LUTHI 


he Greenland ice sheet is influenced by 

warming of its environment in three 

ways: higher melt rates at the surface, 
faster ice loss to the ocean, and faster (or 
maybe slower) sliding over the base. Melting is 
well understood, and big leaps have been made 
recently in understanding how glacier calv- 
ing is influenced by warmer ocean currents. 
Our knowledge of sliding, however, lacks an 
essential factor — a universal relation linking 
sliding speed to the stress state at the glacier 
base, and to the main actor, pressurized 
subglacial water. 

In a milestone study on page 803 of this 
issue’, Schoof provides a unified theory of 
subglacial water drainage. He illustrates how 
the drainage system switches between dif- 
ferent modes while adapting to the variable 
input of surface water, and why variability in 
water input, rather than the total water volume, 
drives ice-sheet acceleration. 

The Swiss physicist and geologist Horace- 
Bénédict de Saussure suspected as early as 
1779 that water drives glacier motion’. Many 
observations*® confirm the more explicit 
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statement that the pressure of subglacial water 
controls sliding processes. And because sur- 
face water from rain and melt finds its way to 
the glacier base, the subglacial water pressure 
varies within minutes — and with it ice velocity. 
On mountain glaciers, 20-80% of the displace- 
ment observed on the surface is attributable 
to sliding processes at the base, the rest being 
due to ice deformation’. The extreme value of 
more than 99% has been observed on West 
Antarctic ice streams, whereas at the other 
end of the scale are extended areas in which 
the ice sheets are frozen to the bed. The only 
direct measurement made in Greenland (60%) 
is not representative of this ice sheet, because 
the drill site concerned was near a fast-paced 
outlet glacier®. 

How future changes in precipitation and 
melt will affect subglacial water pressure, and 
therefore ice velocity, is crucial for predict- 
ing the future evolution of the Greenland 
ice sheet. If more meltwater input increases 
basal motion, more ice will be transported to 
lower elevations, leading to drawdown of the 
ice-sheet surface, and so to further melting. 
Rapid ice loss at rates exceeding previous esti- 
mates would thus be conceivable®. But current 


understanding of subglacial hydrology and 
basal motion do not support the idea that 
ice-sheet sliding increases with the amount 
of water production during the summer melt 
season. Schoof’s model calculations’, based on 
the unified description of subglacial hydrology 
shown in detail in the online Supplementary 
Information for his paper, suggest that propo- 
nents of both ideas are right — at least partly. 

The unification offered by Schoof concerns 
two major types of subglacial drainage system: 
a distributed system of linked cavities’, anda 
system of channels within the ice or along the 
ice-bed boundary’*. The distributed system 
consists of cavities that form in the lee of pro- 
trusions in the glacier bed, and that grow with 
water pressure and with sliding speed. This 
system operates at high water pressure caused 
by inefficient drainage of water. On the other 
hand, channels become enlarged through 
melting of surrounding ice, and can grow with- 
out limit because dissipative heat production 
increases with water flux at high pressure gra- 
dients. Big channels therefore operate at lower 
pressure and grow at the expense of smaller 
channels, which leads to the evolution of an 
arborescent structure similar to an arterial 
network. 

Both types of drainage system adapt to 
changes in water input within hours to days, 
and collapse by the inward creeping motion of 
the ice when the water pressure drops below 
the ice-overburden pressure. The switch 
from a linked cavity system to an arborescent 
channel network is beautifully illustrated in 
Schoof’s animations, which are part of his 
Supplementary Information. 

To understand the consequences of the 
switch in drainage-system configuration, con- 
sider the effect of a sudden increase in water 
discharge on ice-sheet motion. If additional 
surface water reaches a distributed system, 
water pressure increases, large patches of ice 
are separated from the bed, and the contact 
area between ice and bed is diminished. The 
result is less friction, and faster sliding of ice 
over the bed, which again leads to growth of 
cavities. Sudden increase in water input has a 
similar initial effect on a channelized drainage 
system. But higher pressure gradients imme- 
diately lead to higher discharge, and to a rapid 
adaptation to the new water input by channel 
enlargement. Once fully adapted, the bigger 
channel dimensions even lead to lower water 
pressure, which drives more water from the 
surroundings to the channel. The net effect is 
better coupling to the bed and slower sliding 
motion. 

Schoof’s results’ highlight an important 
effect that has been largely ignored until 
recently. Surface water is mainly released in 
pulses — through daily melt, rain or break- 
through drainage from lakes on the ice surface 
— which are usually shorter than the timescale 
of channel enlargement. Such brief pressure 
pulses drive water from channels into the 
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Figure 1 | Exploring a subglacial artery. Glaciologists from the University Centre in Svalbard exploit 

a rare circumstance in investigating a major drainage channel under a glacier (the Rieperbreen glacier 
on Svalbard, an archipelago in the Arctic). Subglacial channels such as this are efficient pathways for the 
escape of highly pressurized water that would otherwise lead to fast sliding motion of the glacier over the 
bed. They are transient features that form during the melt season — when surface water penetrates to the 
glacier base — and that slowly collapse in winter. 


surrounding distributed system, with acceler- 
ated sliding being the result. To sum up, releas- 
ing a certain amount of water at a steady rate 
leads to initial acceleration, and subsequent 
deceleration, whereas release in pulses leads to 
episodic speed-up events that add up to larger 
displacements. 

The main obstacle to successful application 
of the proposed drainage-system model to big 
ice sheets is the nearly complete lack of field 
data from which to determine model para- 
meters. The remoteness of the subglacial envi- 
ronment makes conditions under an ice sheet 
notoriously difficult to observe (except in rare 
situations, as shown in Fig. 1). Measuring basal 
motion and water-pressure variations in ‘wet- 
based’ parts of the Greenland ice sheet requires 
drilling through 500-2,500 metres of ice — an 
expensive endeavour that has been successful 
in only four locations*”"’. Driven by the press- 
ing need to predict the future behaviour of the 
Greenland ice sheet, several current drilling 
projects aim to observe the basal drainage 
system in action. 

Schoof’s paper' constitutes a notable 
advance in understanding subglacial pro- 
cesses. However, development of a complete 
theory will require a fuller knowledge of sev- 
eral other factors. The motion ofa glacier over 
its substrate is a spatially distributed phenome- 
non that involves frictional processes at a wide 
spectrum of scales — solid friction between 


ice and bedrock; granular friction within sed- 
iments ranging in size from silt to boulders; 
friction between sediment and bedrock; and 
drag on the ice when it flows around obstacles. 
And there is another hydrological effect to be 
taken into account: diffusion of highly pres- 
surized subglacial water through sediments 
influences their rheology in a time-dependent 
manner. Even after 50 years of ingenious 
experimental and theoretical advances, much 
remains to be done, in terms of both fieldwork 
and theory. m 


Martin P. Liithi is in the VAW Glaciology 
Group, ETH Ziirich, 8092 Ziirich, 
Switzerland. 

e-mail: luethi@vaw.baug.ethz.ch 


1. Schoof, C. Nature 468, 803-806 (2010). 

2. de Saussure, H.-B. Voyage dans les Alpes (4 vols) 
(Fauche, 1779-1796). 

3. Clarke, G. K. C. Annu. Rev. Earth Planet. Sci. 33, 
247-276 (2005). 

4. Cuffey, K.M. & Paterson, W. S. B. The Physics of 
Glaciers 4th edn (Elsevier, 2010). 

5. Ltthi, M. P., Funk, M., Iken, A., Gogineni, S. & Truffer, 

M. J. Glaciol. 48, 369-385 (2002). 

6. Parizek, B. R. & Alley, R. B. Quat. Sci. Rev. 23, 

1013-1027 (2004). 

7. Kamb, B. Rev. Geophys. Space Phys. 8, 673-728 

(1970). 

8. Rothlisberger, H. J. Glaciol. 11, 177-203 (1972). 

9. Thomsen, H. H. & Olesen, O. B. Rap. Granl. Geol. 

Unders. 152, 36-38 (1991). 

10.Iken, A., Echelmeyer, K., Harrison, W. D. & Funk, M. 
J. Glaciol. 39, 15-25 (1993). 


9 DECEMBER 2010 | VOL 468 | NATURE | 777 


© 2010 Macmillan Publishers Limited. All rights reserved 


J. GULLY 


ARTICLE 


doi:10.1038/nature09605 


Cap binding and immune evasion 
revealed by Lassa nucleoprotein structure 


Xiaoxuan Qi’, Shuiyun Lan”, Wenjian Wang”, Lisa McLay Schelde”, Haohao Dong’, Gregor D. Wallat', Hinh Ly”, Yuying Liang? 


& Changjiang Dong! 


Lassa virus, the causative agent of Lassa fever, causes thousands of deaths annually and is a biological threat agent, for 
which there is no vaccine and limited therapy. The nucleoprotein (NP) of Lassa virus has essential roles in viral RNA 
synthesis and immune suppression, the molecular mechanisms of which are poorly understood. Here we report the 
crystal structure of Lassa virus NP at 1.80 A resolution, which reveals amino (N)- and carboxy (C)-terminal domains 
with structures unlike any of the reported viral NPs. The N domain folds into a novel structure with a deep cavity for 
binding the m7GpppN cap structure that is required for viral RNA transcription, whereas the C domain contains 3’-5’ 
exoribonuclease activity involved in suppressing interferon induction. To our knowledge this is the first X-ray crystal 
structure solved for an arenaviral NP, which reveals its unexpected functions and indicates unique mechanisms in cap 
binding and immune evasion. These findings provide great potential for vaccine and drug development. 


Several arenaviruses, including Lassa virus (LASV), can cause severe 
viral haemorrhagic fevers in humans with high morbidity and mortality, 
to which there is no vaccine and limited treatment’~. These pathogenic 
arenaviruses are public health threats and potential biological threat 
agents. LASV, like other arenaviruses, is a single-stranded ambisense 
RNA virus with two genomic RNA segments encoding four genes’. The 
NP encapsidates viral genomic RNAs into ribonucleoprotein (RNP) 
complexes and is required for both RNA replication and transcrip- 
tion®’. Like bunyaviruses and orthomyxoviruses, arenaviruses snatch 
the cap structure of cellular mRNAs to use as primers to initiate viral 
transcription, the exact mechanism of which is unknown. The cap- 
snatching mechanism of arenaviruses seems to be unique, as evidenced 
by the cytoplasmic localization and the much shorter 5’ non-templated 
mRNA sequences*”’. Severe arenavirus infections including lethal 
Lassa cases are associated with a generalized immune suppression in 
the infected hosts!”"!8, the exact mechanism of which is unclear but is 
thought to involve NP’s ability to suppress the induction of type I 
interferon (IFN)!’”°. To address the functional mechanisms of NP in 
viral RNA synthesis and host immune suppression, we set out to deter- 
mine the crystal structure for LASV NP, knowledge derived from which 
can be extended to other arenavirus NP proteins, as all known arenaviral 
NP proteins share high sequence identity (Supplementary Fig. 1). 


Structure determination 


The full-length 569-residue LASV NP protein (Josiah strain) was 
expressed and purified as a recombinant MBP fusion protein in 
Escherichia coli as described in Methods. The purified protein exists 
mainly in two forms, with a majority in trimeric and some in hexameric 
form. Both forms bind random RNAs, which are longer and more 
abundant in the hexamers than in the trimers, a feature that is similar 
to known NPs from negative-strand RNA viruses’'*. We attempted to 
crystallize both forms, but only the trimeric NP formed crystals. The 
crystals showed heavy twining with a twin fraction of ~0.43 and the 
reflection intensity statistic |E°-1| 0.681/0.681. Initial phases were 
obtained in a space group of P321 using the multiple wavelength 


anomalous diffraction (MAD) with Samarium derivative. The true 
space group was P3 with three subunits in an asymmetric unit. The 
structure was refined to a resolution of 1.80 A with de-twining. The 
crystal structures do not contain RNA, indicating that only RNA-free 
NP was able to form crystals. The final structural model of the native 
LASV NP has an Reactor of 0.18 and an Ree of 0.20. Data collection, 
phasing and refinement statistics are provided in Supplementary Table 1. 


Overall structure of LASV NP protein 

In the NP protomer structure, 514 residues of the 569-residue LASV 
NP protein were built into the model (Fig. 1a). The electron densities 
for residues 1-6, 147-157, 339-363, 518-521, 562-569 were not well 
defined. LASV NP protomer, like other viral NPs’""®, is composed of 
the N- and the C-terminal domains, but neither domain shows struc- 
tural similarity to any known viral NPs (Supplementary Table 2). The 
large N domain (residues 7-338) consists mainly of «-helices and 
coils, whereas the C domain (residues 364-561) forms a typical 
ot/B/c sandwich architecture (Supplementary Text 1). In the trimeric 
form, three subunits lie in a head-to-tail orientation to form a ring- 
shaped structure with a three-fold symmetry (Fig. 1b and Supplemen- 
tary Fig. 2). Surface rendering reveals a deep cavity located near the 
bottom of the N domain and a large cavity at the top of the C domain 
(Fig. 1c, d), which are the cap-binding site and the 3’-5’ exoribonu- 
clease active site (see below), respectively. The interface area between 
the subunits is 455 A?, representing 1.9% of total surface area of a 
subunit (23,343 A”). The central hole of the trimeric structure is 23 A 
in diameter, whereas the head ring is 98 A and the body ring is 118 A 
(Supplementary Fig. 2). 


LASV NP is a 3’-5’ exoribonuclease 

A Dali search (http://ekhidna.biocenter.helsinki.fi/dali_server) identified 
several structures similar to the C domain of NP, including several known 
3'-5’ exonucleases/exoribonucleases in bacteria and humans (for 
example, human TREX1) (Supplementary Text 2), all of which belong 
to the DEDDH subfamily of the DEDD (DnaQ) superfamily***’. The 
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Figure 1 | The crystal structure of LASV NP protein. a, Cartoon diagram of 
the LASV NP protomer. The N domain is in cyan with the cyan sphere 
indicating the N terminus; the C domain is in orange with the orange sphere 
indicating the C terminus. The black sphere shows Mn?*, whereas the blue 
sphere shows Zn?*. The dotted lines represent the disordered loops. b, The 
ring-shaped structure of LASV NP trimer. The first protomer is coloured as in 
a, the second protomer is in blue and the third is in magenta. The groove and 
the interface are indicated by arrows. c, Electrostatic surface potential map of 
the NP protomer. The entrance of the cap-binding cavity is shown as a white 
dotted circle. The blue area represents positively charged residues and the red 
area represents negatively charged residues. d, Electrostatic surface potential 
map of the 3'-5’ exoribonuclease cavity. The black sphere represents Mn?". 


human TREX! structure shows two Mn’ * cations in the active site?”. We 
identified one Mn7* in each subunit of LASV NP by crystal fluorescent 
scanning, but could not identify the second Mn*", possibly because it 
was not well ordered in the absence of the RNA substrate. The C domain 
of NP superimposes well with the portion of TREX1 that coordinates the 
Mn°* cations (Fig. 2a), in particular the B5, 86, 87, B8 and B9 strands of 
NP completely overlap with the central B-sheets of TREX1. The putative 
exonuclease catalytic residues D389, E391, D466, D533 and H528 are 
absolutely conserved in all known arenavirus NP proteins and are 
located at identical positions as in the TREX1 active cavity (Fig. 2b). 
Taken together, the structural evidence indicates that LASV NP is a new 
member of the DEDD 3’-5’ exonuclease superfamily. 

We conducted in vitro assays to characterize the 3’-5’ exonuclease 
activity of the wild-type LASV NP, as well as NP mutants at putative 
catalytic sites. We showed that the wild-type protein, in its trimeric or 
hexameric form, could digest both DNA and RNA substrates (Sup- 
plementary Figs 3 and 4). As divalent cations are essential for exonu- 
clease activity’, we determined what divalent cation was most effective 
for NP exonuclease to digest various single-stranded RNA (ssRNA) 
species that are based on the NP gene in the viral genomic sense 
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Figure 2 | The C domain of LASV NP is a 3'-5’ exoribonuclease. 

a, Superimposition of the C domain (orange) with human TREX] protein 
(green) reveals a high degree of similarity between the two structures. Mn’ is 
in black in LASV NP and red in TREX1; Zn’* is in blue. b, The exonuclease 
catalytic residues of LASV NP and TREX] are located in identical positions, 
and are shown in orange for NP and green for TREX1. c, The exoribonuclease 
activities of the wild-type (WT) and mutant LASV NP with different ssRNAs as 
substrates. Control 1 contains 10 mM EDTA and no NP. Control 2 contains 
10mM EDTA and NP. d, Comparison of the wild-type and NP catalytic 
mutants in degrading the dsRNA substrates, the 5’-hydroxyl dsRNA (top), 
double 5'-triphosphorylated dsRNA (middle), and the single 5’- 
triphosphorylated dsRNA (bottom). 


(60 nucleotides, VRNA), complementary antigenomic sense (30 nucleo- 
tides, CRNA), or in capped mRNA form (126 nucleotides, mRNA) 
(Methods). We showed the order of efficiency as Mn?* >Co?*> 
Mg** > Ca?* >Zn** >Fe**>Ni?*>Cu** (Supplementary Fig. 5). 
Wild-type NP could cleave various ssRNA species efficiently (Fig. 2c), 
regardless of whether they contained a hydroxyl (5'OH) group, triphos- 
phate (5’ppp), or a cap at the 5’ termini (Methods). In contrast, the NP 
catalytic mutants (D389A, E391A and D466A) showed markedly 
reduced RNase activity (Fig. 2c and Supplementary Fig. 4). In addition, 
we showed that wild-type NP, but not its catalytic mutants, could 
digest cellular RNA substrates in vitro with a preference towards short 
RNA species over long ones (for example, 18s rRNA versus B-globin 
mRNA, the large versus small fragments in the RNA ladder) (Sup- 
plementary Figs 6 and 7). We also demonstrated that wild-type NP, 
but not its catalytic mutants (D389A, E391A and D466A), can effi- 
ciently degrade various dsRNA molecules with 5'-hydroxyl (5'OH), 
single 5’ -triphosphorylate (5’ppp/5’OH) and double 5’ -triphosphorylate 
(5’ppp/5’ ppp), as well as the long dsRNA mimic poly(I:C) (Fig. 2d and 
Supplementary Fig. 8). 

Fluorescence scanning analysis identified a zinc ion in the NP 
structure, despite the fact that no typical zinc finger motif was pre- 
dicted from the amino acid sequence and that no zinc compounds 
were used during the purification and crystallization processes. 
Although the residues C506, C529, H509 and E399 that coordinate 
the zinc ion are not of the typical zinc-binding motif”, they appear to 
adopta zinc finger fold in structure*””. The CCHE zinc-binding site is 
located in the C domain near the 3’-5’ exonuclease active site (Sup- 
plementary Fig. 9). We speculate that zinc binding may be required to 
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stabilize the structure of the C domain and/or contribute to the sub- 
strate binding and specificity of the exonuclease activity**”’. A highly 
positively charged groove located between the N and C domains is 
predicted as the genomic RNA-binding site (Supplementary Fig. 10). 
An in vitro assay confirmed that RNAs are bound within the purified 
NP oligomers and protected from its intrinsic exonuclease activity 
(Supplementary Figs 10 and 11, Supplementary Text 3 and Methods). 


Exonuclease and immune evasion 


To determine whether the exoribonuclease activity is important for 
the transcriptional function of NP, we generated alanine substitution 
at five putative catalytic sites, D389A, E391A, D466A, D533A and 
H528A, in the mammalian cell expression vectors of either native or 
Myc-tagged NP gene, and examined the activity of each mutant in 
transcribing the LASV minigenome RNA that encodes a Renilla luci- 
ferase (RLuc) reporter gene’ (Methods). As shown in Fig. 3a, each NP 
mutant expressed comparable protein levels to the wild type, and led 
to similar folds of increase in RLuc activity, indicating that these 
mutations did not alter the overall structure (Supplementary Text 4 
and Supplementary Fig. 12) or affect the basic function of NP in 
mediating viral RNA transcription. 

We next examined whether the exoribonuclease activity is required 
for NP’s function in the suppression of IFN’’”°. As expected, wild- 
type NP strongly inhibited Sendai-virus-induced IFN-f activation by 
a promoter assay (Methods), whereas all the catalytic mutants D389A, 
E391A, D466A, D533A and H528A showed a complete loss of func- 
tion at a low level of transfected expression vectors (10ng) and 
showed various levels of deficiency at higher levels (Fig. 3b and 
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Figure 3 | The exonuclease activity of NP is important for blocking the IFN 
induction. Results shown are the average (n = 3) with error bars indicating the 
standard deviations. a, The NP catalytic mutants were expressed at similar 
levels to the wild type in mammalian cells and had similar transcriptional 
activities in the LASV minigenome assay. b, The NP catalytic mutants were 
defective in suppressing the Sendai-virus (SeV)-induced IFN induction by a 
LUC-based IFN-B promoter assay. c, The NP catalytic mutants were defective 
in suppressing the IFN production induced by the immunostimulatory RNAs 
poly(I:C) and Pichinde-virion-associated RNAs. 
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Supplementary Fig. 13). Our results confirm a previous study showing 
that the D389 residue of LASV NP, as well as its corresponding residue 
D382 in the prototypic arenavirus lymphocytic choriomeningitis virus 
(LCMV), is required for IFN suppression but not for viral RNA tran- 
scription*’, and may help to explain the loss of IFN suppression for 
Tacaribe virus NP (Supplementary Fig. 14 and Supplementary Text 5). 
In summary, these data provide strong genetic evidence for an import- 
ant role of the NP exoribonuclease activity in suppressing the IFN 
induction. 

Viral infections are usually detected by the cellular pattern-recognition 
receptors (PRRs) such as toll-like receptors (TLRs) and cytosolic RNA 
sensors, retinoid-acid-inducible gene-I-like helicase (RIG-I) and mel- 
anoma differentiation-associated protein 5 (MDAS5), which recognize 
the pathogen-associated molecular patterns (PAMP) RNA ligands and 
initiate signalling pathways to induce the production of type I IFNs*’””. 
We hypothesize that NP prevents the virus-induced IFN induction by 
degrading the PAMP RNA ligands that otherwise would trigger the 
viral sensors in the cells. 

We examined whether the NP RNase function is essential for sup- 
pressing the IFN production induced by the immunostimulatory RNAs, 
that is, poly(I:C) and the virion RNAs extracted from Pichinde virus, 
which is a prototypic arenavirus®’. We found that whereas wild-type NP 
efficiently inhibited the IFN-f activation induced by poly(I:C) or by 
Pichinde-virion-associated RNAs, none of the five catalytic mutants 
(D389A, E391A, D466A, D533A and H528A) exhibited any suppres- 
sive activity (Fig. 3c). Similar results had been reported for LCMV NP™. 

We have shown that the NP exoribonuclease activity is essential for 
suppressing both viral-infection-induced and immunostimulatory- 
RNA-induced IFN production. A good example of exonuclease- 
meditated suppression of IFN production has been demonstrated 
for human TREX1 protein, which degrades small ssDNAs and 
dsDNAs accumulated during cellular apoptosis. Failure to clear these 
DNA fragments by TREX] natural mutants leads to the activation of 
cellular DNA receptors to trigger a persistent production of IFNs that 
contributes to human autoimmune diseases*”****. How does the NP 
RNase activity function in suppressing the virus-induced IFN pro- 
duction? A simplistic but reasonable model is that the NP RNase 
activity is able to remove viral PAMP RNAs that are otherwise recog- 
nized by the cellular PRRs. Although we have shown that LASV NP 
protein can degrade various RNA templates in vitro, we believe that 
the NP RNase activity must be highly regulated in vivo, as NP does 
not cause a generalized nonspecific RNA degradation process of cel- 
lular or viral RNAs in the cells (Supplementary Text 6 and Sup- 
plementary Fig. 15). We propose that the NP RNase activity in the 
cells is restricted to viral PAMP RNAs through a yet-to-be characterized 
regulatory mechanism. A recent publication has shown a direct 
protein-protein interaction of NP with RIG-I and MDAS (ref. 34), 
which may be one possible mechanism for the specific nuclease activity 
of NP against these PRR-associated PAMP RNAs. 


LASV NP is a cap-binding protein 


The N domain adopts a completely novel fold not found in the Dali 
server. To identify the cap-binding residues in the deep cavity of the N 
domain, we attempted to soak and perform co-crystallization of 
LASV NP with m7GpppG, triphosphorylated, diphosphorylated or 
monophosphorylated ribonucleotides (Methods). We could observe 
the clear density for the triphosphate and partial density for uridine 
(Supplementary Fig. 16) from the triphosphorylated ribonucleotide 
complex structures. We also visualized the structure of NP in complex 
with dTTP with a clear original F,—F, electron density contoured at 
2.56 for dTTP (Fig. 4a). The triphosphate group of dTTP was bound 
in the middle of the cavity in an identical manner as that of UTP 
(Supplementary Fig. 16), in which it was anchored by salt bonds 
formed with the side chains of the conserved residues K309, R300, 
R323 and K253. In the deep end of the cavity, thymidine occupied a 
hydrophobic pocket that is composed of residues F176, W164, L172, 
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Figure 4 | The cap-binding residues and their roles in viral RNA 
transcription. Results shown are the average (n = 3) with error bars indicating 
the standard deviations. a, A cap analogue dTTP is bound within the deep 
cavity of the N domain of LASV NP. Original F, — F. map for the dTTP in blue 
contoured at 2.50. The F176 and W164 or L172 (L120) residues form a typical 
cap-binding sandwich structure. The middle cavity binds the triphosphate 
moiety and the hydrophobic cavity entrance can accommodate the second base 
of the cap structure. The carbon atoms are in pink for the dTTP, in yellow for 
the deep cavity residues and in green for the cavity entrance residues. b, The NP 
mutants were expressed at similar levels as the wild type at 15-30 ng plasmid 
(WT-15, WT-30) in the transfected mammalian cells. ¢, Mutational analyses of 
the residues within the cap-binding cavity for the transcriptional activity using 
the LASV minigenome assay. 


M54, L120, L239 and 1241. We propose that this dTTP-binding 
pocket is the binding site for the cap structure m7GTP and that the 
residues located within the pocket may have to change conformation 
to accommodate the cap moiety. Although the N domain of NP is not 
structurally similar to any of the cap-binding proteins (Supplemen- 
tary Table 3), its hydrophobic thymidine-binding pocket shares common 
features for cap binding*’*°. Moreover, the NP cap-binding cavity has 
a unique feature in that its entrance contains another hydrophobic 
region that is composed of the hydrophobic residues Y319, Y209, 
Y213, L265 and the acidic residue E266, which can potentially act as 
the binding site for the second base of the m7GpppN (where N repre- 
sents G, C, U or A) cap structure. The entrance of the cap-binding 
cavity has an oval shape with a diameter of 9-13 A, which is a perfect 
fit for the single-stranded mRNA. We propose that a loop composed 
of residues K236 to S242 serves as a ‘gate’ for the capped template 
(primer) binding and that the entire structure of m7GpppN, including 
the cap m7G, the triphosphate, and at least one more nucleotide, is 
embedded within the deep cavity. This binding feature is unlike other 
known cap-binding proteins, in which only the m7G caps are locked in 
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between the sandwich, whereas the rest of the RNA molecule is 
exposed**”, 

To characterize the role of the cap-binding residues in viral RNA 
transcription, we examined a panel of NP mutants with alanine sub- 
stitution of residues located inside and at the entrance of the cavity 
and that are conserved among all known arenaviruses for their ability 
to mediate the cap-dependent viral RNA transcription using the 
LASV minigenome replicon assay (Methods). Wild-type NP (with 
or without Myc tag) produced up to a 1,000-fold increase in RLuc 
reporter activity over a control reaction, and more than 100-fold 
increase even when expressed at a low level (15 ng of transfected 
NP plasmid DNAs). All mutant proteins were expressed at similar 
levels as the wild type transfected with 15-30 ng of plasmid (Fig. 4b). 
Compared to the wild type, the K253A and E266A mutants completely 
lost the RNA transcription activity, and the Y319A, F176A, W164A, 
K309A and R323A mutants showed significantly decreased activity 
(Fig. 4c). R300A had a minor effect, whereas W12A and Y209A had 
no effect. It is worth noting that none of these mutants was found 
to impact the NP function in the suppression of IFN (Supplemen- 
tary Fig. 17). These functional data correlate well with the proposed 
cap-binding function of some of these conserved residues. 

The unique cap-binding feature of LASV NP, in that the entire cap 
structure m7GpppN is buried within the cavity, has significant impli- 
cations in understanding the distinctive cap-snatching mechanism of 
arenaviruses. Once NP binds and protects the 5’ cap m7GpppN, the 
rest of the mRNA molecule located outside of the cavity may be 
susceptible to viral and/or host exonuclease-mediated degradation 
and/or to endonuclease-mediated cleavage (Supplementary Fig. 9). 
This may help to explain the relatively short (1-4 nucleotides) 5’ 
non-templated sequences in arenavirus mRNAs'*”. However, indi- 
vidual mutation of the NP exonuclease catalytic sites did not show any 
defect in viral cap-dependent RNA transcription (Fig. 3a), indicating 
that the NP exonuclease activity is not essential (required) for generat- 
ing the capped primers. It is worth noting that we did not identify an 
influenza polymerase PA-like endonuclease structural motif" within 
LASV NP structure (Supplementary Table 4). Instead, recent studies 
indicated that the LASV L polymerase protein contains an endonu- 
clease domain in its N terminus that is crucial for the cap-dependent 
viral RNA transcription®™. 


Conclusion 

Our structural analysis and functional assays have demonstrated that 
the C domain of LASV NP contains 3’-5’ exoribonuclease activity 
that is required for suppressing IFN-B induction. We have provided 
evidence to suggest that the NP RNase activity is highly regulated in 
cells and proposed a novel mechanism by which the NP RNase activity 
may specifically remove the viral PAMP RNA ligands to suppress the 
production of IFN. Another important feature of LASV NP protein is 
that its N domain contains a deep cavity to bind and shield the entire 
m7GpppN cap structure, which is distinct from other known cap- 
binding proteins, and has shed light on the unique cap-snatching 
mechanism of arenaviruses. In addition, we have also identified an 
unusual zinc-binding site and the viral RNA-binding groove in the 
LASV NP structure. Taken together, these findings reveal several 
new and potentially vulnerable targets on NP for the development of 
antivirals and effective vaccines to combat LASV and other pathogenic 
arenaviruses that can cause severe haemorrhagic fever diseases in 
humans. 


METHODS SUMMARY 


The crystals were grown using the sitting-drop technique, and the native structure 
was determined with the MAD data. All the NP mutations were generated using 
the QuikChange site-directed mutagenesis kit (Stratagene) and confirmed by 
DNA sequencing. The RNA synthesis assays used the LASV minigenome (MG) 
system, and the Sendai-virus-induced IFN-f activation assay was conducted as 
described*. The immunostimulatory RNA-induced IFN-f activation assay was 
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conducted by transfecting HEK293 cells with the IFN-B-LUC promoter construct 
and either wild-type or mutant NP construct, followed by Lipofectamine-2000- 
mediated transfection of poly(I:C) or Pichinde-virion-isolated RNAs. Activation 
of the IFN-B promoter was quantified by measuring the LUC activity. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Protein expression and purification. The full-length LASV NP gene (Josiah strain) 
was cloned into the pMAL-c2X-derived pLou3 plasmid, downstream of the TEV 
cleavage site following the MBP gene. This construct, encoding the N-terminal 
MBP tagged NP protein, was transformed into Rosetta cells (Novagen). After 
IPTG induction at a final concentration of 0.03 mM overnight at 20 °C, the cells 
were harvested by centrifugation at 8,000 r.p.m. for 20 min and suspended in TEN 
buffer (20 mM Tris, pH 7.5; 0.2 M NaCl, 10% glycerol, 1 mM EDTA) with protease 
inhibitors (Roche), 14M DNase (Sigma) and 1mM phenylmethylsulphonyl 
fluoride (Sigma). After cells were lysed by a cell disruptor (Constant System Ltd), 
the cell lysates were collected by centrifugation at 20,000 r.p.m. for 30 min and 
applied on an amylose column. The column was washed with >10-column volumes 
of the sample buffer. The MBP-NP fusion protein was eluted with the TEN buffer 
containing 10 mM maltose. The MBP-NP fusion protein was then cleaved by Tev 
proteinase. The MBP portion was removed through two amylose columns, and the 
NP protein was purified to homogeneity by gel filtration column. Trypsin digestion 
coupled with mass spectroscopy confirmed that the purified LASV NP protein was 
homogenous (data not shown), with a final concentration of 7 mg ml. 
Crystallization and data collection. A Cartesian robot (Genomic solutions) was 
used to screen for optimal crystallization conditions. The native crystals were 
obtained in 0.2 M LiCl, and 20% PEG3350 in 1 week at 20 °C. To obtain the NP 
complex with m7GpppG, m7GTP, or m7GDP, the NP protein was incubated 
with individual compound at a concentration of 2mM for 30 min on ice and the 
crystallization conditions were screened. The NP complex with other triphos- 
phorylated, diphosphorylated or monophosphorylated nucleotides were formed 
by incubating the NP protein with 50mM of the respective compounds for 
30 min on ice and the crystallization conditions were screened. The crystallization 
conditions were optimized until the resolution of the data was better than 2.5 A. 
All crystals grew in 0.2M KCI or 0.2 M LiCl and 14-22% PEG3350. The NP 
complexed with manganese ion was obtained by crystallizing the NP protein in 
0.2 M MnCl, 25% PEG3350 followed by soaking the crystals in 0.2 M NaCl,, 20% 
PEG3350 and 15% glycerol three times for 15min each. The presence of the 
manganese and the zinc ions was confirmed in all the crystals by fluorescence 
scanning at the Diamond light sources UK. All the crystals were protected by 
cryoprotectants that contain 15% to 20% glycerol in the crystallization conditions 
before data collection in 102 or 103 at the Diamond light sources UK. The 
Samarium derivative crystals were obtained by soaking the crystals overnight 
in 100 mM Samarium acetate, 0.2 M LiCl and 16% PEG3350, and was protected 
in a cryoprotectant of 0.2 M LiCl, 16% PEG3350 and 20% glycerol. The Samarium 
derivative MAD data were collected at a wavelength of 1.83 A for peak data, 
1.84 A for inflection data and 1.45 A for remote data from a single crystal. All 
the data were indexed, integrated and scaled by HKL2000 or Mosflm and Scale. 
Structure determination. The crystals were heavily twined with a twining frac- 
tion of 0.43. The initial phases were obtained from a space group of P321 using the 
MAD data and SOLVE”. The initial model was built using RESOLVE”, 
Buccaneer and Coot*’. It was found that the true space group of the crystals 
was P3 during the structure refinement. The structures were refined using 
REFMAC5*, and the water molecules were added into the structure by ARP/ 
wARP™. The F, — F, maps for ligands (dTTP, UTP, zinc and manganese) were 
calculated before any ligand was added into the structures. The structures were 
de-twinned at last using REFMACS5, and the structures were evaluated using 
Molprobity”. 

In vitro RNA synthesis. The 30-nucleotide cRNA (sense) sequence 5'-CUGGGC 
UUACCUAUUCUCAGCUGAUGACCC-3’ was derived from the LASV NP 
(Josiah strain) S$ segment (nucleotides 2186-2215 in antigenomic orientation) 
and chemically synthesized by Eurogentic. The 30-nucleotide VRNA (in genomic 
orientation) sequence 5'-GGGUCAUCAGCUGAGAAUAGGUAAGCCCAG-3’' 
was complementary to the CRNA. The cRNA (30 nucleotides) was used as one of 
the three substrates for 3’-5’ exoribonuclease assay. To obtain the blunted dsRNA, 
both cRNA and vRNA oligonucleotides were dissolved into 0.1 M NaCl, 1mM 
EDTA and 0.1 M Tris pH 8.0 at the final concentration of 200 mM, and an equal 
amount of the two oligonucleotides was mixed together and annealed in a thermo- 
cycler as follows: 95 °C for 3 min, 68 °C for 1 min and then 4 °C. 

The 5’-triphosphorylated vRNA was generated by in vitro transcription of the 
partial dsDNA template formed by the T7 promoter sequence 5’-AATTTAA 
TACGACTCACTATAGG-3' and the reverse complement of the T7 promoter 
sequence and of the LASV (Josiah strain) S segment (nucleotides 2186-2215) 
5'-CTGGGCTTACCTATTCTCAGCTGATGACCCTATAGTGAGTCGTATT 
AAATT-3’ using the T7 MEGAshortscript kit following the manufacturer’s 
instructions (Ambion). A similar strategy was use to generate the 32-nucleotide 
triphosphorylated cRNA with the T7 primer and LASV (Josiah strain) S segment 
(nucleotides 2186-2213) 5’-GGGTCATCAGCTGAGAATAGGTAAGCCCA 
GCCTATAGTGAGTCGTATTAAATT-3’. A similar strategy was used to generate 


the 60-nucleotide VRNA corresponding to LASV (Josiah strain) S segment (nucleo- 
tides 2186-2213), using the partial dsDNA template formed by 5'-AATTTAAT 
ACGACTCACTATAGG-3’ and 5’-GTAAATCCCTGCAGTCGGCAGGGTTTA 
CCGCTGGGCTTACCTATTCTCAGCTGATGACCCTATAGTGAGTCGTAT 
TAAATT-3’ as a template. To generate the doubly 5’-triphosphorylated dsRNA, 
equal amounts of the triphosphorylated 5’ppp-vRNA and 5’ppp-cRNA were 
annealed in vitro. To make the singly 5'-triphosphorylated dsRNA, equal amounts 
of the in vitro synthesized 32-nucleotide 5’ ppp-vRNA and the chemically synthe- 
sized 30-nucleotide unphosphorylated cRNA were annealed in vitro. The human 
18S rRNA fragment (128 nucleotides) was generated by a T7 RNA polymerase- 
directed in vitro RNA synthesis reaction, using the pTRI-RNA 18S control plasmid 
(Ambion), following the manufacturer’s instruction. 

To synthesize the capped viral mRNA transcripts corresponding to nucleotides 
992-1117 of the LASV NP gene, the DNA template was PCR amplified from the 
NP expression plasmid with a forward primer 5’-AATTTAATACGACTCAC 
TATAGGGAAAACACTGTCGTTGATCTGGAATC-3' (underlined are T7 
promoter sequences) and a reverse primer 5'-GGGTCATCAGCTGAGAATAG 
GTAAGCCCAGCGG-3’, and subjected to in vitro RNA synthesis using the 
mMESSAGE mMACHINE 17 Ultra kit (Ambion) following the manufacturer’s 
instruction, except that no poly(A) tail was added. 

A plasmid phRL-CMV that encodes the T7 promoter (T7p)-directed human 
B-globin gene was provided by R. Elliott and G. Blakqori. The T7p-globin DNA 
fragment was purified by agarose electrophoresis after digestion of the phRL- 
CMV plasmid with HindIII and Smal. The capped human globin mRNA tran- 
scripts were generated using the T7p-globin fragment as a template and the 
mMESSAGE mMACHINE T7 Ultra kit from Ambion, and the poly(A) tail 
was added following the manufacturer’s instruction. 

The ssRNA markers (perfect RNA markers, 0.1-1 kb) were purchased from 
Novagen. The low molecular mass ssRNA marker (10-100 nucleotides) was pur- 
chased from USB. The dsRNA ladder (21-500 bp) was purchased from New England 
Biolabs. 

In vitro 3'-5’ exoribonuclease assays. The in vitro 3'-5’ exoribonuclease assays 
were carried out in 10 pl of the reaction solution containing 0.3 M NaCl, 10% 
glycerol, 20 mM Tris pH7.5, 10mM MnCh, 7 pg of either wild-type or mutant 
NP proteins, and 8 units of the RNaseIN inhibitor (Promega), in the presence of 
various substrate(s), at 37 °C for 60-100 min. The control reactions included all 
but MnCl, which was substituted by 20mM EDTA. All the reactions, each in 
triplicate, were stopped by the addition of EDTA toa final concentration of 20 mM. 
The samples were mixed with equal volumes of RNA loading buffer (Ambion), 
heated at 95 °C for 3 min, cooled on ice for 5 min, and separated in 15% or 6% urea- 
polyacrylamide gel, or 2% agarose gel. The gels were stained in 0.05% ethidium 
bromide for 25 min, visualized using the 2UV transilluminator (UVP). 

The luciferase-based assay to quantify virus-induced and immunostimulatory 
RNA- induced interferon-f activation. The Sendai-virus-induced IFN- activation 
assay was conducted as described previously”. In brief, 293T cells were co-transfected 
using calcium phosphate with 100 ng of a vector that expresses the firefly luciferase 
(FLuc) reporter gene from a known functional promoter sequence of the IFN-B gene 
(pIFNB-LUC), variable amounts of either wild-type or mutant LASV NP vectors, and 
50 ng ofa -gal-expressing plasmid for transfection normalization. At 24 h after trans- 
fection, cells were infected with Sendai virus (at multiplicity of infection = 1) to induce 
IFN-B expression. At 24 h after infection, cell lysates were prepared for luciferase and 
B-gal assays. FLuc activities were normalized by the f-gal values. Each transfection was 
conducted in triplicate and repeated in at least two independent experiments. 

To determine whether NP can suppress the immunostimulatory RNA-induced 
IFN production, HEK293 cells were transfected with pIFNB-LUC, variable 
amounts of either wild-type or mutant LASV NP vectors, and a B-gal-expressing 
plasmid for transfection normalization. Eighteen hours later, cells were transfected 
with either 1 ug of poly(I:C) or 250 ng of Pichinde virion RNA by lipofectamine 
2000. Luciferase activity was determined at 18h after the immunostimulatory 
RNA transfection and normalized by the f-gal activity. 

Pichinde virion RNA preparation. Pichinde viruses were purified by 20% sucrose 
gradient ultracentrifugation at 50,000g for 2h. Virus RNA was extracted with 
RNABee (Tel Test) according to the manufacturer’s protocol. 

LASV minigenome (MG) transcription assay. The full-length LASV L and NP 
genes (Josiah strain) were cloned into the pCAGGS vector for expression in 
mammalian cells. The LASV MG construct contains the T7 promoter-directed 
LASV S-segment-like sequences that include all the important cis-acting ele- 
ments required for viral RNA synthesis (5’ UTR, intergenic region and 
3' UTR) and encode a Renilla luciferase (RLuc) gene in place of the viral NP 
coding sequence. This LASV-based LUC-encoding minigenome (MG) RNA was 
transcribed in vitro by the T7 MEGAScript kit (Ambion) and transfected into 
293T cells, together with the LASV L expression plasmid, and wild-type or 
mutant NP expression plasmid. A B-gal expression vector was included in each 
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transfection to normalize for cell transfection efficiency. LUC activity was deter- 
mined at 24h after transfection, normalized by B-gal activity, and shown as fold 
increase over a control sample that lacked the L expression plasmid. Each reaction 
was conducted in triplicate and in at least two independent experiments. 
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Structure of a bacterial ribonuclease P 
holoenzyme in complex with tRNA 


Nicholas J. Reiter’, Amy Osterman!, Alfredo Torres-Larios'+, Kerren K. Swinger't, Tao Pan? & Alfonso Mondragon! 


Ribonuclease (RNase) P is the universal ribozyme responsible for 5’-end tRNA processing. We report the crystal 
structure of the Thermotoga maritima RNase P holoenzyme in complex with tRNA’"*. The 154 kDa complex consists 
of a large catalytic RNA (P RNA), a small protein cofactor and a mature tRNA. The structure shows that RNA-RNA 
recognition occurs through shape complementarity, specific intermolecular contacts and base-pairing interactions. 
Soaks with a pre-tRNA 5’ leader sequence with and without metal help to identify the 5’ substrate path and potential 
catalytic metal ions. The protein binds on top of a universally conserved structural module in P RNA and interacts with 
the leader, but not with the mature tRNA. The active site is composed of phosphate backbone moieties, a universally 
conserved uridine nucleobase, and at least two catalytically important metal ions. The active site structure and 
conserved RNase P-tRNA contacts suggest a universal mechanism of catalysis by RNase P. 


Ribonuclease P (RNase P) is a ribonucleoprotein complex responsible 
for processing many different RNA molecules in the cell (for recent 
reviews, see refs 1-3). It is found in almost all organisms and is 
composed of one essential RNA subunit and one or more protein 
subunits. The RNA component is responsible for catalysis and can 
process RNA in vitro in the absence of protein, albeit with reduced 
efficiency*. The discovery that the RNA component is the catalytic 
moiety* helped cement the notion that RNA can be directly involved 
in catalysis. RNase P is considered a remnant of an ancient RNA- 
based world and an example of an RNA-based catalyst with many 
features in common with protein-based catalysts. 

RNase P recognizes its substrate in trans and is a multiple turnover 
enzyme. The preferred substrate is pre-tRNA and recognition 
involves features distant from the cleavage site, such as the TYC 
loop of the tRNA acceptor stem*. RNA cleavage requires divalent 
metals*®’, yet the chemical mechanism and the location of the active 
site remain largely undefined as well as the exact role of the protein 
components. In the case of bacterial RNase P, the single essential 
protein improves the reaction rate by two to three orders of mag- 
nitude*’, helps to stabilize the active P RNA fold*’®, binds the 5’ 
leader region of the pre-tRNA substrate’’”’, and assists in product 
release”. 

Structural studies of the RNA component reveal a two domain 
(S- and C-domains) molecule formed by single and coaxial stems 
linked together by a variety of tertiary interactions'*"”, including five 
conserved regions I to V (CR-I to CR-V) of P RNA that are common 
to all organisms"*. These conserved regions cluster into two areas, one 
involved in substrate recognition and the other forming the active site 
scaffold”. 

Here we present the crystal structure of Thermotoga maritima 
RNase P holoenzyme in complex with mature tRNAPES, and also the 
structure of the complex in the presence of a post-cleavage tRNA 
leader. The two structures help answer key questions about the mech- 
anism of this crucial ribozyme with implications for a broader under- 
standing of the general mechanisms of RNA-RNA based recognition 
and catalysis. 


Structure determination 

The components of the complex were purified separately and assembled 
by mixing and heating before crystallization (see Methods). The pre- 
tRNA was processed into mature tRNA and hence the structure repre- 
sents a ribozyme-product complex. To promote crystal formation, two 
interaction modules”’ were introduced, which had a modest effect on 
catalytic activity (Supplementary Fig. 1 and Supplementary Table 1). 
The crystals diffract anisotropically to 3.8 A and ~4.0 A. An initial 6 A 
map was obtained from phases from a TagBrj, derivative; these phases 
helped locate heavy atoms in other derivative data sets. Multiple iso- 
morphous replacement with anomalous scattering (MIRAS) phases 
produced an excellent map to 4.1 A where all three components were 
visible (Fig. 1, Supplementary Figs 2 and 3, and Supplementary Tables 
2-4). Density was particularly clear for the RNA molecules, whereas 
density was only clear for the protein backbone and hence the high 
resolution model of the T. maritima protein’ was positioned without 
significant rebuilding. The P RNA was built into the map using the 
structures of T. maritima’ and Bacillus stearothermophilus'* P RNA 
as guides, whereas T. maritima tRNA’ used yeast tRNA?” as a guide”. 
The structure was refined using anisotropic data to 3.8 A resolution. 
Crystals with a tRNA leader present were obtained by soaking a short 
oligonucleotide with and without samarium chloride and this structure 
was refined to 4.2 A resolution. 


Overall structure 


In the complex, the tRNA sits with the acceptor stem against RNase P, 
making several tRNA-P RNA intermolecular contacts (Fig. 1 and 
Supplementary Fig. 1). The TC and D loops of the tRNA contact 
the S-domain, while the acceptor stem extends from the S-domain 
into the C-domain crossing the main P1/P4/P5 coaxial stem (Fig. 2 
and Supplementary Fig. 4). The 3’ CCA end of the tRNA enters a 
tunnel formed by P6/P15/P16/P17 and base pairs with nucleotides in 
the L15 region (Fig. 2 and Supplementary Figs 4 and 5), an interaction 
recognized previously~’. The 5’ end of the tRNA indicates the location 
of the active site, which is close to the region where P4, P5 and CR-IV 
intersect. The protein component is also adjacent to the 5’ end of the 
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Figure 1 | Crystal structure of the T. maritima RNase P holoenzyme in 
complex with tRNA. a, Structure of bacterial RNase P, composed of a large 
RNA subunit (338 nucleotides, ~110 kDa) and a small protein component 
(117 amino acids, ~14.3 kDa), in complex with tRNA (76 nucleotides, 

~26 kDa). The RNA component serves as the primary biocatalyst in the 
reaction and contains two domains, termed the catalytic (C, blue) and 
specificity (S, light blue) domains. The RNase P protein (green) binds the 5’ 
leader region of the pre-tRNA substrate and assists in product release. 
Transfer RNA (tRNA?»®) (red) makes multiple interactions with the P RNA 
(see Fig. 2 and Supplementary Fig. 1 for details). Regions in grey denote 
additional RNA nucleotides required for crystallization. b, Alternative view of 
the RNase P-tRNA complex, identifying the tRNA recognition regions: the 
5’ end where catalysis occurs, the 3’ CCA end, and the highly conserved T'¥C 
and D loop regions. c, View of the 4.1 A experimental electron density map 
centred on the 5’ end of tRNA. The map is represented as a dark grey mesh, 
contoured at 1.41r.m.s.d. 
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Figure 2 | tRNA recognition by RNase P is mediated by RNA~-RNA 
interactions. a, Schematic of the P RNA secondary structure mapping the 
tRNA-P RNA contacts observed in the crystal structure. The tRNA nucleotides 
(1°72, 2, 3, 64 and 65) and regions (5’, 3’, T'¥C loop, D loop and acceptor) 
involved in direct interactions are shown in red. Intermolecular base pairs form 
between the 3’ end of tRNA (DCCA) and loop 15 (L15), where D is the 
discriminator nucleotide that serves as an identity element in tRNA biogenesis. 
P RNA nucleotides that are universally conserved (black, uppercase), conserved 
among all bacteria (grey, uppercase), or highly conserved in bacteria (black, 
lowercase) are identified. Metal ions are shown as filled pink circles, and denote 
the location of the active site (M1, M2), and other structurally important 
regions (M3, M4). Single and double dashes in red represent minor groove and 
base stacking interactions, respectively. All identified tRNA-P RNA contacts 
are within 4 A. The crystallized T. maritima P RNA consists of eighteen paired 
helices (P), five universally conserved regions (CR-I to CR-V) (black), two 
junctions containing conserved nucleotides in bacteria (dark grey), several loop 
(L) regions, and an engineered tetraloop region (T, light grey). The coaxial P1/ 
P4/P5 stem is shown in blue, P2/P3 stems in cyan, P6/P15/P16 and L15/L17 in 
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tRNA, but does not contact it. The protein contacts include the CR-IV 
and CR-V regions, the P15 stem, and the P2/P3 helix interface (Fig. 3 
and Supplementary Figs 1 and 6). The pre-tRNA leader makes exten- 
sive contacts with the protein, but few with the P RNA. 

The components of the RNase P holoenzyme are largely unchanged 
when bound to tRNA (Supplementary Fig. 7). A comparison between 
T. maritima P RNA alone” and in the complex reveals an overall 
similar fold (backbone normalized root mean square deviation 
(r.m.s.d.) ~1.1 A) with a small change in the relative orientation of 
the two domains (Supplementary Fig. 7). The only major change in 
the P RNA structure occurs in the vicinity of the P15—P17 stems (Sup- 
plementary Figs 8 and 9). A few additional residues at the amino 
terminus were clear and follow a similar path to the B. subtilis protein” 
(Supplementary Fig. 10); no changes in the structure of the protein 
component were detected. The structures of yeast and T. maritima 
tRNA®®= show remarkable resemblance (backbone normalized 
r.m.s.d. for acceptor stem ~0.8 A) (Supplementary Fig. 11). Further, 
a comparison with previous models reveals an excellent agreement 
with the predicted secondary structure* and a good agreement with 
the models of the complex’’**”” (Supplementary Fig. 12). 


tRNA recognition 

The observed RNA-RNA interactions involved in substrate recog- 
nition agree with previous biochemical studies*”*”* and include (1) 
stacking between bases in the tRNA TC and D loops and the P RNA 
S-domain, (2) an A-minor interaction at the acceptor stem, and (3) 
the formation of canonical base pairs at the 3’ end of tRNA (Fig. 2 and 
Supplementary Fig. 1). The first interaction identifies the TC loop as 
a key element in recognition. Both the tRNA D and TC loops have 
unstacked bases (G19 and C56) that interact with unstacked bases in 


Protein 


yellow, P7 and P10/P11/P12 in orange, P8/P9 in light green, and P13/P14 in 
pink (see Supplementary Fig. 1 for additional details). b, Recognition of tRNA 
by the P RNA of RNase P. The acceptor stem of tRNA (red) docks onto the P 
RNA (coloured as in a) making a series of interactions, including base stacking 
in the T'YC/D loops of tRNA and the S-domain, an A-minor interaction, and 
base pairing, ribose zipper and stacking interactions between the 5’ and 3’ ends 
of tRNA and the C-domain. The protein (green) makes no direct contacts with 
mature tRNA. Critical metal ions (M1-M4) identified are shown as magenta 
spheres. c, tRNA recognition by the S-domain. Two universally conserved P 
RNA regions (CR-II and III, dark grey) facilitate base stacking interactions with 
unstacked bases in the structurally conserved TC and D loops of tRNA. 
Dashed circles highlight this stacking interaction between P RNA residues 
A112, G147 and tRNA residues G19, C56. A conserved P RNA adenosine 
(A198) stacks into the minor groove of the acceptor tRNA stem. d, Recognition 
of the tRNA 3’ CCA by the C-domain. Intermolecular base pairs form between 
the 3’ tRNA (ACC) and the L15 (GGU) loop of P RNA. This interaction is 
stabilized by a structural metal (M3, magenta sphere) and a L15 ribose zipper 
conformation. 
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Figure 3 | Protein-RNA contacts within the RNase P holoenzyme. a, The 
protein sits on the P RNA surface formed by conserved regions I, IV and V. The 
protein (green, shown as ribbons) additionally contacts the L15/P15 junction 
and the P2/3 helices (P RNA as coloured in Fig. 2). Labelled P RNA nucleotides 
make protein contacts (within 4 A) and include: A45 in CR-I, U257 and G258 
in the L15/P15 junction, U293, U294, G295, A296 and U297 in CR-IV, and 
A311, G312 and A313 in CR-V. Bold nucleotides are universally conserved. 
b, Surface representation of the protein coloured by sequence conservation 
(variable (V), tan; neutral (N), light green; conserved (C), green). A highly 
conserved patch in the protein extends from the vicinity of the 5’ end of the 
tRNA, and interacts with P RNA conserved regions IV (U293-U297) and V 
(A311-A313). Other P RNA nucleotides that make protein contacts include: the 
P2 helix (C18-G22, G298-A299), the P3 helix (G37) and the L15/P15 junction 
(U257-G258). Four hundred and ninety bacterial RNase P proteins were 
included in the analysis of the sequence conservation using the ConSurf 
server’. Panels c and d show different orientations to emphasize that high 
sequence conservation is concentrated in the region of the protein that faces the 
conserved regions of the P RNA. Neutral or slightly conserved regions shown in 
these two orientations correspond to a patch that interacts with the leader. 


the P RNA (A112 and G147), forming G19-A112 and C56-G147 
stacks in the complex. The second major interaction involves a highly 
conserved unstacked adenosine (A198) in the P11 stem entering the 
minor groove of the tRNA acceptor stem. These interactions facilitate 
shape complementarity and help explain the central role of the 
S-domain in recognition. The third major interaction involves inter- 
molecular base pairing between the tRNA 3’ DCCA motif and the L15 
loop. This interaction is probably conserved in all bacterial and most 
archaeal RNase Ps, but not in organisms where CCA is added post- 
transcriptionally’. The fourth to last nucleotide, A73, forms a 
Watson-Crick base pair with nucleotide U256. C74 and C75 form 
Watson-Crick base pairs with G255 and G254, while the terminal 
A76 forms a weak interaction with G253. To accommodate these 
intermolecular base pairs, the two strands of L15 fold into a ribose 
zipper. In addition, a structural metal ion (M3) (Fig. 2 and Sup- 
plementary Fig. 13) binds adjacent to this P RNA-tRNA region and 
is likely to correspond to a metal ion identified biochemically’. In the 
complex, the 3’ end of the tRNA separates from the 5’ end and enters 
a wide opening formed by P6/P15/P16/P17 (Figs 1 and 2 and Sup- 
plementary Figs 4 and 5). This opening is ~20A in diameter, can 
easily accommodate a single-stranded RNA molecule, and is created 
when the P6/L17 pseudoknot forms (Supplementary Figs 1 and 5). 


Protein—RNA interactions 


The bacterial RNase P protein structure is highly conserved, but has 
little or no sequence or structural similarity with the protein com- 
ponents of archaea or eukarya”. In the complex, the protein is near 
the 5’ end of tRNA, but is too far (over 6 A) to make direct contacts. 

The protein sits between the P15 and P3 stems (Fig. 3 and Supplemen- 
tary Fig. 6), and also contacts the CR-IV and CR-V loop regions of P 
RNA. Comparison of bacterial sequences shows that the protein has a 
large, contiguous area with high sequence conservation (Fig. 3 and 
Supplementary Fig. 6) including important residues identified previ- 
ously'’*°*". The conserved area extends in an arch along the surface of 
the protein, starting from a point close to the 5’ end of the tRNA and 
faces the universally conserved modules. 
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To investigate the interactions with the leader, crystals were soaked 
with a short oligoribonucleotide in the presence and absence of Sm**. 
Fourier difference maps to 4.2 A show five phosphates of the leader 
along the conserved surface of the protein (Fig. 4 and Supplementary 
Fig. 6), but the position of the nucleobases was ambiguous. The struc- 
ture shows that the leader contacts residues Phe 17, Phe 21, Lys 51, 
Arg52 and Lys 90 and probably interacts with Gln 28, Lys 56 and 
Arg 89, in agreement with biochemical results*”'*°°°!. The 3’ end of 
the leader is located adjacent to the 5’ end of the tRNA and near two 
conserved residues (Arg52 and Lys56). A metal ion is present in 
between the leader 3’ and the 5’ end of mature tRNA (Figs 4 and 5 
and Supplementary Fig. 14), but is too distant (>4 A) to ligate protein 
residues directly. Leader nucleotides —1 to —3 are poised to interact 
with nucleotides A213, U294, G295 and A314 of P RNA (Fig. 4 and 
Supplementary Figs 1 and 6). These results indicate that the major role 
of the protein component is to interact with the leader to align the pre- 


tRNA in the complex, as observed previously”. 


Active site 
The location of the active site is inferred from the 5’ end of mature 
tRNA (Fig. 5 and Supplementary Fig. 15). The phosphate backbone of 


tRNA nucleotides (+1 to +3) sits on the major groove of the P4 stem 
(near A50, G51 and U52), and places the tRNA 5’ end next to the P4 


Figure 4 | Pre-tRNA leader-protein interactions in the RNase P 
holoenzyme. a, Surface representation of the protein coloured by sequence 
conservation as in Fig. 3. The pre-tRNA 5’ leader (purple, with purple and 
orange spheres for the phosphorous and non-bridging oxygens, respectively) 
was modelled as a polyphosphate chain with five phosphates (P_, to P_s). The 
leader follows a highly conserved patch in the protein extending from the 5’ end 
of the mature tRNA (red) and away from the P RNA. The addition ofa 5’ leader 
with metal (Sm?*) reveals a second metal ion (M2). b, Alternative view of the 
pre-tRNA leader-protein interaction. Each phosphate position (P_, though 
P_s) was visible in a 4.2 A difference Fourier map (mF, — DF.) calculated from 
crystals where only the leader was soaked into the crystals (blue mesh, 3 r.m.s.d. 
contour levels). A second 4.2 A difference Fourier map (mF, — DF.) calculated 
from crystals where the leader and Sm** metal were soaked into the crystals 
shows clearly the position of the second metal ion (magenta mesh, 3.5 r.m.s.d. 
contour level). P RNA residues poised to make contacts are labelled. Nucleotide 
U52 serves as a reference point in a and b and does not interact with the 5’ 
leader oligonucleotide. 


©2010 Macmillan Publishers Limited. All rights reserved 


Figure 5 | Structure of the RNase P active site environment. a, The active site 
is inferred from the location of the mature 5’ end of tRNA. The diagram shows 
the position of the mature tRNA (red), the leader (purple), the protein 
component (green) and the P RNA (blue and grey). A group of conserved P 
RNA nucleotides (A49-U52, A213, A313 and A314) form part of the active site. 
Two metal ions (magenta spheres) are found in the active site. b, The two active 
site metal ions (M1 and M2) are within 4 A of the 5’ phosphate of tRNA and the 
MI1-M2 metal-metal distance is ~4.8 A. The M1 metal makes contacts 
(S2.1 A, solid grey bonds, labelled) with tRNA (G1 O1P) and P RNA (A50 O1P 
and U52 04) oxygens. Other possible ligands within 3.5 A of M1 or M2 are 
represented by dashed grey lines (Supplementary Table 5). The figure shows 
two isomorphous difference Fourier (mF, — DF.) maps. The green mesh 
corresponds to a Eu* soak in the absence of leader and is contoured at the 
9.5 r.m.s.d. level. The magenta mesh corresponds to a Sm** and 5’ leader soak 
and is contoured at the 5.5 r.m.s.d. level. The second metal is clearly visible only 
when the leader is present. c, Schematic diagram of the interactions around the 
active site. The diagram shows all residues within 8 A of the 5’ phosphorus 
atom of tRNA. Short dashed lines represent metal ligand distances within 2.2 A 
and longer dashed lines represent nucleotides which form canonical base pairs. 
Nucleotides in bold are universally conserved in P RNA. The P RNA, tRNA, 5’ 
leader, and protein side chains are shown in blue, red, purple and green, 
respectively. d, Proposed reaction mechanism for the endonucleolytic cleavage 
of pre-tRNA by RNase P based on the structure of the enzyme-product (E-P) 
complex and previous mechanistic studies***’. The M1 metal distance to the 5’ 
phosphate ligands (Supplementary Table 5) in the E-P complex is consistent 
with the proposed enzyme-substrate (E-S) transition state. In this proposed 
reaction scheme, M1 is ~180° from the apical O3’ position and activates a 
hydroxyl nucleophile for an in-line nucleophilic displacement, creating a new 
bond and displacing the 3’ scissile phosphate oxygen. As RNase P proceeds 
through an Sy? reaction pathway, the stereochemistry around the phosphorus 
atom undergoes a net inversion of configuration. If the pro-Rp (O2P) oxygen 
coordinates metal in the E-S complex during catalysis, as previously 
observed*”®, this would subsequently allow for the pro-Sp (O1P) oxygen to 
coordinate metal in the E-P complex, as observed in the crystal structure. 
Product release could be facilitated by a metal (M2) coordinated water, which 
would enable proton transfer to the 3’ scissile oxygen. The exact active site 
geometry and identity of other metal ligands in an E-S complex has yet to be 
established. 


phosphate backbone and nucleotides A313 and A314 (Fig. 5 and 
Supplementary Fig. 15). The universally conserved U52 nucleotide 
is unstacked from the P4 stem and faces the tRNA 5’ end. In addition, 
the tRNA 1¢72 base pair is stabilized by an adenosine stack with A213, 
a nucleotide conserved in all bacteria. 

A metal ion (M1), putatively magnesium, is found trapped between 
the tRNA 5’ end, the A50 and G51 phosphates, and the 04 oxygen of 
the universal U52 nucleotide and was confirmed using crystals soaked 
with Sm** and Eu** (Supplementary Figs 14 and 15). Putative M1 
metal contacts include the A50 non-bridging phosphoryl oxygen, the 
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O4 oxygen of the U52 nucleobase, and the O1P oxygen at the 5’ end of 
tRNA. Other metal-ligand interactions may include: the backbone of 
A50, the phosphoryl oxygen of G51, and the 5’ end of tRNA (Sup- 
plementary Table 5). Many of these oxygen ligands have been impli- 
cated in metal coordination and catalysis****. The M1 site may also 
coincide with a site (M6) observed in the structure of B. stearother- 
mophilus P RNA*. The structure of the complex suggests that M1 
participates in catalysis by directly binding P RNA and the 5’ phos- 
phate of tRNA. 

A second metal (M2) was located in experiments where the leader 
was soaked in the presence of Sm*~. The M2 metal is in close proxi- 
mity to the phosphoryl oxygens of G51, the O3’ of the leader, and the 
5’ end of tRNA (Supplementary Table 5). The two metals observed in 
crystals soaked with the leader and Sm** are ~4.8 A apart (Fig. 5 and 
Supplementary Fig. 15). The structures indicate that the active site 
includes at least two metal ions upon complex formation with pre- 
tRNA. Due to its location, the M2 metal ion could make additional 
contacts with both the tRNA and the P RNA during catalysis. 

The structures of the active site of the complex and the apo- 
ribozyme structures are similar (Supplementary Figs 7, 16 and 17), 
including the presence of a metal ion next to the P4 helix*’. With the 
exception of the U52 nucleobase (Supplementary Fig. 16), no large 
changes are observed in the active site region. A fully occupied M2 site 
is observed only in the presence of leader, suggesting that a local 
metal-dependent conformation change may occur, as previously 
reported®. The structure also reveals that the tRNA 5’ and 3’ ends splay 
and separate to interact with the P RNA (Supplementary Fig. 11), 
confirming the need for movement of the tRNA ends***’”. Although 
accommodating the upstream RNA leader probably requires local 
protein and P RNA structural changes, the location of the active site 
is not significantly altered and is largely pre-assembled. 


Mechanistic implications 

RNase P can cleave a variety of substrates’’®°*, but pre-tRNA is the 
only one that is common among all organisms. To decipher its func- 
tion, it is important to understand two different aspects of pre-tRNA 
processing by RNase P: substrate specificity and the chemical mech- 
anism of cleavage. 

tRNA recognition by RNase P involves the highly conserved tRNA 
TC and D loops and the CR-II and CR-III in the S-domain of P 
RNA. Thus, regions with high sequence and structure conservation 
are involved in specific tertiary interactions, suggesting a universal 
mode of recognition among all RNase P. The presence of unpaired 
nucleotides next to the cleavage site is also an important feature for 
pre-tRNA recognition, although it is unclear whether this is a universal 
feature of all natural substrates’. Finally, pre-tRNA is usually processed 
to form a 7-base-pair-long acceptor stem. An additional role of the 
interactions between CR-II and CR-III and tRNA may be to serve as a 
‘ruler’ that ensures that the correct lengths are processed, although 
there is some flexibility as tRNAs with acceptor stems 8 base pairs long 
can be processed”. The interaction with the 3’ CCA end is also a key 
recognition feature, but may not be necessarily an RNA-RNA inter- 
action in higher organisms. The L15 loop of P RNA is not found in 
eukarya or some archaea” and its function may be replaced by addi- 
tional protein(s), suggesting that 3’ CCA intermolecular base pairing is 
not a universal interaction. 

The second important aspect of RNase P function is the chemical 
mechanism of cleavage. Hydrolysis of a phosphodiester bond generates 
the mature 5’ RNA product. Whereas it is not possible to propose a 
complete mechanism from a structure at this resolution, the RNase 
P-tRNA structures, together with extensive biochemical information, 
help identify the major active site components. The structure indicates 
that at least two distinct metals play a direct role. It is possible to propose 
a transition state model (Fig. 5d) where the M1 metal directly positions 
the scissile phosphate oxygens of the substrate and enables a hydroxyl 
ion to perform an Sy2-type nucleophilic substitution. In this scenario, 
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the M2 metal ion stabilizes the transition state and mediates proton 
transfer to the 3’ scissile oxygen during product release, as proposed 
previously’. Other universally conserved nucleotides in the vicinity 
seem to have a structural role in forming the correct structure and 
are not directly involved in catalysis, consistent with proposals that 
sequence conservation is largely the result of strong structural con- 
straints'’. Hence, the RNase P-tRNA complex shows how the P RNA 
structure can serve as a scaffold to bind and orient metals and substrate 
properly. It seems that RNase P uses a two-metal ion catalytic mech- 
anism, similar to other mechanisms proposed based on other large 
ribozyme structures*'” and originally put forth as a general mechanism 
for many ribozymes”. 

The structural studies of the holoenzyme-tRNA complex help to 
show that all RNase P ribozymes share a common, RNA-based mech- 
anism of RNA cleavage and recognition that involves two universally 
conserved structural modules. Adaptation through the addition of 
protein increases RNase P functionality by positioning accurately 
the 5’ leader pre-tRNA substrate and by contacting conserved regions 
of the P RNA structure. The unique tertiary fold of the P RNA uses 
shape complementarity, specific RNA-RNA contacts, and intermol- 
ecular base pairing to recognize its substrate efficiently. Within this 
tertiary fold, the universally conserved regions are crucial to form the 
active site scaffold and to create regions involved in tRNA recognition. 
In addition, both P RNA and the pre-tRNA help to coordinate two 
catalytically important metal ions essential for the putative mech- 
anism of pre-tRNA cleavage. The RNase P-tRNA complex offers a 
glimpse into the transition from an ancient, RNA-based world to the 
present, protein-catalyst dominated world and affirms that RNA 
molecules can display comparable versatility and complexity. 


METHODS SUMMARY 

Crystallization. Preparation, purification and folding of T. maritima RNase P 
and tRNA?" have been described*'”*. For crystallization, the components were 
mixed in a 1:1.1:1 (P RNA:pre-tRNA:protein) molar ratio to a concentration of 
45 uM. The mixture was heated to 94 °C (2 min), cooled to 4 °C (2 min), and after 
the addition of MgCl, to a final 10 mM concentration, further incubated at 50 °C 
(10 min) and 37 °C (40 min). Crystals were obtained by mixing 1 pul of complex 
with 1 pl of reservoir solution (1.8 M Li,.SO4, 50 mM sodium cacodylate (pH 6.0)) 
and equilibrated by vapour diffusion at 30 °C. Crystals were cryo-protected using 
reservoir solution containing 15% xylitol. 

Data collection and structure determination. Diffraction data were collected at 
100 K at the LS-CAT sector at the APS. Complete native and TagBrj2, SmCls, 
EuCl, and iridium hexammine (Ir(NH3),)°* derivatives were collected. A weak 
Molecular Replacement” solution using a trimmed model of the tRNA-P RNA 
complex’” located the TagBr,. cluster. Multi-wavelength anomalous dispersion 
(MAD) phases“ from the cluster extended to ~6 A, with the map showing a clear 
envelope. These phases were used to locate the other heavy atoms that were used 
to calculate a 4.1 A MIRAS map. To locate the pre-tRNA leader, crystals were 
soaked with a T. maritima 5' tRNA 7-nucleotide leader sequence (final concen- 
tration 0.2 mM), with and without 14mM SmCl. Difference maps allowed the 
placement of five pre-tRNA nucleotides and the unambiguous identification of a 
second active site metal. The experimental electron density map was of excellent 
quality and allowed model building of nearly all RNA phosphate and nucleobase 
positions and accurate placing of the protein. Model building was guided by the 
known structures'*””*'. Final Rwork and Reece are 24.9% and 27.0%, respectively, 
with r.m.s.d. of 0.007 A and 1.24° for bonds and angles. Figures were made with 
PyMOL”. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Preparation of the T. maritima RNase P holoenzyme-tRNA°™ ternary com- 
plex. RNA transcriptions were performed in vitro using purified His,-tagged T7 
RNA polymerase using standard protocols*'. Sequences from the T. maritima 
RNase P RNA and tRNA” genes were inserted into a pUC19 vector at FokI and 
BsmAI restriction sites, respectively, allowing for run-off transcription of the 
DNA plasmid after digestion with the appropriate restriction enzyme (NEB). 
Constructions of modified RNA molecules with either mutations or additions 
were performed using a QuikChange mutagenesis kit (Stratagene). RNA samples 
were purified by 6% denaturing polyacrylamide gel electrophoresis (PAGE), 
identified by ultraviolet absorbance, recovered by diffusion into 50 mM potassium 
acetate (pH 7) and 0.2 M potassium chloride, and precipitated with ethanol. tRNA 
was further purified by anion exchange (MonoQ (5/50 GL)) and gel filtration 
(HiPrep 26/60, Sephacryl S-200) chromatography (GE Health Sciences). Over- 
expression and purification of the RNase P protein from T. maritima was per- 
formed as described previously”. 

To form the RNase P holoenzyme-tRNA complex, unfolded P RNA, unfolded 
tRNA and P protein molecules were mixed at a 1:1.1:1 molar ratio in 66 mM 
HEPES, 33mM Tris (pH7.4), 0.1.mM EDTA (1X THE) and 100mM 
CH3COONH, (Ref. 8). The ternary mix, at a final concentration of 45 UM, was 
incubated at 94 °C for 2 min and then cooled to 4 °C over 2 min. After addition of 
MgCl, to a final 10 mM concentration, the reaction mixture was incubated at 
50 °C for 10 min, followed by incubation at 37 °C for 40 min, and finally cooled to 
4°C over 30s. 

Rational design of an RNA tertiary module to build a crystal lattice. To pro- 
mote formation of a crystal lattice, intermolecular interactions were facilitated by 
introducing a tertiary structure interaction module. Based on the T. maritima 
RNA sequence and a proposed model of the PRNA-tRNA complex””, constructs 
were designed where a tetraloop was inserted into the P12 loop (L12) of the P 
RNA and a tetraloop-receptor into the anticodon stem of tRNA (Supplementary 
Fig. 1). These two RNA regions were chosen as they were deemed to be far from 
the active site or other regions involved in specific interactions. In addition, the 
P12 stem of P RNA has a highly variable helix length across all organisms, lacks 
sequence conservation, and is non-essential or absent in several organisms*°. The 
P12 and the anticodon loop of tRNA are not known to form any functional 
contacts. The length of the anticodon and the P12 stems were systematically 
varied by single base pair insertions adjacent to the tetraloop and tetraloop 
receptor module, thus altering the position (~2.7 A per base pair added) and 
orientation (~36° per base pair added) of the tetraloop receptor and the tetraloop. 
Forty two combinations of molecules were screened for crystallization conditions 
using a sparse matrix approach employing a set of crystallization conditions 
developed locally. A few combinations of RNA molecules produced crystals, with 
most of them diffracting poorly. The best crystals were obtained from a construct 
where the P12 and anti-codon stems were elongated by five and three base pairs 
respectively. Insertion of two G-U wobble pairs adjacent to the tetraloop-tetraloop 
receptor module further improved diffraction, and also created a binding site for 
an iridium hexammine cation. 

Crystallization and data collection. Crystals were obtained by mixing 1 ul of 
complex with 1 1l of reservoir solution (1.8 M LiSO4, 50 mM sodium cacodylate 
(pH 6.0)) and equilibrated by vapour diffusion hanging or sitting drops at 30 °C. 
Gel analysis of washed crystals show that all three components were present (data 
not shown). Attempts to crystallize the complex in the absence of protein yielded 
no crystals. Crystals suitable for data collection grew in approximately 3 weeks 
and were cryo-cooled in liquid nitrogen immediately after transfer to reservoir 
solution containing 15% xylitol. Crystals of the RNase P holoenzyme-tRNA 
ternary complex suitable for data collection grew to approximately ~80-300 1M 
per side/edge, and diffract anisotropically to 3.8 A in the best direction and ~4.0 A 
in other directions. Crystals belong to space group P3,21 (a=b=169.3A, 
c = 185 A) and contain one molecule per asymmetric unit. 

A series of derivatized crystals were also prepared by soaking in heavy metal 
compounds. Derivatives were prepared by soaking the crystals in mother liquor 
plus the derivative and incubating for 2-24h before transferring them to cryopro- 
tectant with the derivative present and freezing them in liquid nitrogen. Successful 
derivatizations were obtained by soaking the crystals in the following compounds: 
2mM TagBr}2, 14mM samarium chloride (Sm**), 14mM europium chloride 
(Eu>*), and 15mM iridium hexammine (Ir(NH3),)°*. However, several of the 
compounds partially precipitated upon addition to the mother liquor solution and 
hence the final concentration is not known precisely. In addition, crystals with a 
leader present were obtained by soaking in a 0.2mM heptamer oligonucleotide 
(5'-A_7A_¢6G_5G_4C_3G_2U_.,-3’) (Thermo Fisher) for 4h with and without 
14mM samarium chloride present. The sequence was chosen by selecting the most 
common nucleotide in the T. maritima tRNA leaders at each position. 


All diffraction data were collected at 100 K at the Life Science-Collaborative 
Access Team (LS-CAT) sector located at the Advance Photon Source (APS) using 
Rayonix CCD detectors. As the crystals are very radiation sensitive, the data 
collection range was optimized using the program MOSFLM” to collect the most 
complete native or anomalous data set using the minimal rotation range. Multi- 
wavelength anomalous dispersion (MAD) data were collected from a tantalum 
bromide cluster (TagBrj2) derivative at three different wavelengths. Single or 
multiple wavelength anomalous dispersion data were also collected from the 
samarium chloride (Sm**), europium chloride (Eu *), and iridium hexammine 
(Ir(NH3),)** derivatives. Data were processed with XDS®* and scaled with 
SCALA™. All other processing was done with programs from the CCP4 suite”, 
except when noted. Data collection statistics for native and derivative data sets are 
shown in Supplementary Table 2. 

In all cases, the diffraction limits of the data were anisotropic. The extent of the 
anisotropy was determined using the Anisotropy Server® and the data were 
treated in three different ways: (1) without any anisotropy correction; (2) carving 
the data to the limits suggested by the anisotropy server (30 cut-off level on 
amplitudes); and (3) applying an anisotropic correction to the data using the 
server. For the second case, the integrated data from XDS was carved to the limits 
suggested by the server and then merged and scaled with SCALA before final 
processing. In many instances, the phasing and refinement calculations were 
done separately with the complete and carved data sets and the results compared. 
Overall, the different ways of treating the data had little effect on the final results, 
even though the data collection statistics were better for the carved data set (see 
Supplementary Tables 2 and 3). 

Structure determination and model refinement. Molecular replacement (MR) 
studies with the program PHASER®* using a proposed partial model of the PRNA- 
tRNA complex” gave a weak low resolution (25-8 A) MR solution (Z-scores: 5.4 
and 9.0 for the rotation and translation functions, respectively). Phases calculated 
from the MR solution were used to locate the position of the three sites in the 
TagBr,2 cluster data set. The program SHARP“ was used to calculate MAD phases 
using data from three different wavelengths and spherically averaged form factors 
for the cluster. The solvent-flattened MAD map was of excellent quality but the 
phases were only good to ~6 A resolution. The positions of the Eu’, Sm** and 
(Ir(NH3)¢)°* heavy atoms were determined using the cluster phases. The para- 
meters from the cluster and other derivatives could not be refined simultaneously 
and instead multiple isomorphous replacement with anomalous scattering 
(MIRAS) phases to 4.1 A resolution were calculated using data from the single- 
atom derivatives together with phase information to 6 A from the cluster data. The 
SOLOMON*™ solvent-flattened map was very clear (Supplementary Fig. 2) and all 
three molecules were apparent in the map. The model for the P RNA-tRNA 
complex"? fit well in many areas, but the map showed regions where the model 
needed to be changed, regions that were missing in the model, like the P12 exten- 
sion and the pseudoknot region, and the position of the protein. The models for the 
tRNA and P RNA were rebuilt completely using the high resolution model of yeast 
tRNA?"® (ref. 22), T. maritima P RNA” and B. stearothermophilus P RNA“ as 
guides. All regions of the RNA molecules were visible in the map and regions that 
were missing in the original T. maritima P RNA model were built. Some minor 
corrections to the original model were needed, but overall the models for P RNA 
agree well. The protein density was clear for the backbone, but not for the side 
chains and hence the high resolution model of the T. maritima protein” was placed 
on the experimental electron density map as a rigid body with minimal rebuilding. 

Refinement was performed using Refmac5”” and BUSTER”. Owing to the 
resolution of the data, the models were restrained to enforce good hydrogen 
bonding distance between Watson-Crick base pairs, planarity between base pairs 
(both for Watson-Crick and non-Watson-Crick base pairs), and C3’-endo sugar 
puckering for recognizable secondary structure elements. In addition, during 
BUSTER refinement the protein was restrained by the high resolution structure 
of the protein”. Model building with Coot” was interspersed with either Refmac5 
or BUSTER refinement. During rebuilding, missing nucleotides were added as 
well as some missing residues at the N terminus of the protein. Mg** ions were 
included at positions that had high density peaks in residual maps and also 
coincided with heavy atom sites. Other large peaks in the native data set that 
coincided with phosphate positions in the leader-soaked crystals were modelled 
as phosphate ions. No individual atomic or group temperature factors were 
refined, only an overall anisotropic temperature factor. The final stages of the 
refinement were done with the program BUSTER. The refinement was done both 
with a carved data set where data outside the anisotropic diffraction limits (30 
cut-off) were excluded and also with a complete data set (isotropic) to the highest 
resolution limit (Supplementary Table 3). No significant difference was noted in 
the two refinements and the refinement statistics and electron density maps 
calculated from either data set were also virtually identical. It seems that the 
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anisotropic temperature factor correction in the refinement programs adequately 
modelled the modest anisotropy of the data. 

The final model for the P RNA includes nucleotides 1 to 338. Only the phosphate 
backbone was modelled for nucleotides 39, 241 and 314-317. In addition, 
9 nucleotides were inserted between nucleotides 130 and 136 to account for the 
extension added for crystallization (Supplementary Fig. 1). The final model for the 
tRNA includes nucleotides 1-76, but only the phosphate backbone was modelled 
for nucleotides 16, 17 and 20. The crystallization module added eight extra nucleo- 
tides incorporated at the end of the anticodon stem (Supplementary Fig. 1). Nearly 
the entire anticodon stem and anticodon loop were altered to accommodate the 
tetraloop receptor and altered anticodon loop. The protein model includes resi- 
dues 6 to 117. The positions of all side chains were ambiguous in the map and were 
not rebuilt, but kept as much as possible as in the original 1.2 A model (PDB ID 
1NZO0) during refinement. Side chains that collided with the RNA were rebuilt 
when needed. There are four Mg”* and two phosphate ions in the model of the 
complex. The final model to 3.8 A resolution has an overall Ryork Of 24.9% and Réree 
of 27.0% with a root mean square deviation (r.m.s.d.) from target values of 0.007 A 
and 1.24° for bonds and angles, respectively. The model in the presence of the 
leader includes an additional polyphosphate molecule with five phosphates and 
two Mg** ions coinciding with metals ions M1 and M2. A total of five Mg”* ions 
were modelled into the complex that contains the 5’ polyphosphate leader back- 
bone. The final model to 4.21 A resolution has an overall Ryork Of 25.8% and Réreo of 
26.7% with an r.m.s.d. of 0.007 A and 1.23° for bonds and angles, respectively (see 
Supplementary Tables 3 and 4). 

Model superpositions were done with programs from the CCP4 suite”, Isqman® 
and Coot”. Diagrams were made with PYMOL”. Coordinates and structure factors 
have been deposited in the PDB with accession numbers 30K7 and 3OKB. 
Activity assays of RNase P holoenzyme. Cleavage assays measuring k-a/Km 
under single turnover conditions were performed on the RNase P and pre- 
tRNA constructs that gave the best diffracting crystals. The pre-tRNA (with a 
single nucleotide leader (—1)) which yielded crystals and a control pre-tRNA 
(containing a T. maritima nine nucleotide leader (—9)) were radioactively 
labelled at their 5’ ends. Labelled substrates were purified over a 10% denaturing 
polyacrylamide gel and identified by **P-phosphorimaging. The holoenzyme was 
folded and cleavage reactions were performed in identical conditions as the 
folding reaction (1X THE, 10mM MgCl, 0.1M CH3;COONHy, 37°C). The 
enzyme activity of both the modified RNase P which gave crystals and the 
T. maritima wild-type RNase P were tested. The reaction was initiated by mixing 
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pre-folded RNase P holoenzyme (25, 50 and 100 nM) with pre-folded pre-tRNA 
substrate (<4 nM), incubated for various times (t = 0, 0.25, 1, 4 and 16 min), and 
subsequently quenched by adding 9 M urea, 50 mM EDTA. All reaction mixtures 
were loaded directly on a 15% denaturing polyacrylamide gel which separated the 
substrate from the product(s). To observe unambiguously the products of the 
leader (—1) pre-tRNA, thin layer chromatography (TLC) was also performed 
with polyethyleneimine (PEI)-cellulose coated plates, where the quenched reac- 
tion mixture was spotted and run in a 5% acetic acid/100 mM NH, Cl solution. 
The dried gels and the TLC plates were exposed to a phosphorimaging screen and 
the reaction profile was quantified by a phosphorimager (Fuji Medical) using 
ImageGauge software. A plot of the percentage of product over time gave the 
cleavage reaction rate for each concentration. Single turnover conditions assum- 
ing a first order reaction follow the equation P=P,,.(1 — els), where P is the 
fraction of pre-tRNA cleaved, P,, is the fraction of uncleaved pre-tRNA at the end 
of the reaction, and k,,, is the observed reaction rate constant. By measuring ky, 
at different concentrations it is possible to obtain K..1/Kyy assuming Michaelis- 
Menten kinetics. 
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Quantitative reactivity profiling predicts 
functional cysteines in proteomes 


Eranthie Weerapana’*, Chu Wang!*, Gabriel M. Simon!”, Florian Richter®*, Sagar Khare*®, Myles B. D. Dillon’, 
Daniel A. Bachovchin!?, Kerri Mowen?, David Baker?*° & Benjamin F. Cravatt>? 


Cysteine is the most intrinsically nucleophilic amino acid in proteins, where its reactivity is tuned to perform diverse 
biochemical functions. The absence of a consensus sequence that defines functional cysteines in proteins has hindered 
their discovery and characterization. Here we describe a proteomics method to profile quantitatively the intrinsic 
reactivity of cysteine residues en masse directly in native biological systems. Hyper-reactivity was a rare feature 
among cysteines and it was found to specify a wide range of activities, including nucleophilic and reductive catalysis 
and sites of oxidative modification. Hyper-reactive cysteines were identified in several proteins of uncharacterized 
function, including a residue conserved across eukaryotic phylogeny that we show is required for yeast viability and 
is involved in iron-sulphur protein biogenesis. We also demonstrate that quantitative reactivity profiling can form the 
basis for screening and functional assignment of cysteines in computationally designed proteins, where it discriminated 
catalytically active from inactive cysteine hydrolase designs. 


Large-scale scientific endeavours such as genome sequencing and 
structural genomics are providing a wealth of new information on 
the full complement of proteins present in eukaryotic and prokaryotic 
organisms. Many of these proteins, however, remain partly or com- 
pletely unannotated with respect to their biochemical activities’. New 
methods are therefore needed to characterize protein function on a 
global scale. Much effort is currently devoted to the characterization 
of post-translational modification events because these covalent 
adducts can have profound and dynamic effects on protein activity’. 
Another frequently overlooked parameter that defines functional 
‘hotspots’ in the proteome is amino acid side-chain reactivity, which 
can vary by several orders of magnitude for a given residue depending 
on local protein microenvironment. Methods to measure side-chain 
reactivity en masse directly in complex biological systems have not yet 
been described, and as such, the reactive landscape of the proteome 
remains largely unexplored. 

Among the protein-coding amino acids, cysteine is unique owing 
to its intrinsically high nucleophilicity and sensitivity to oxidative 
modification. The pK, of the free cysteine thiol is between 8 and 9, 
meaning that only slight perturbations in the local protein micro- 
environment can result in ionized thiolate groups with enhanced 
reactivity at physiological pH’. Diverse families of enzymes use 
cysteine-dependent chemical transformations, including proteases, 
oxidoreductases and acyltransferases*. In addition to its role in cata- 
lysis, cysteine is subject to several forms of oxidative post-translational 
modification, including sulphenation (SOH), sulphination (SO,H), 
nitrosylation (SNO), disulphide formation and glutathionylation, 
which endow it with the ability to serve as a regulatory switch on 
proteins that is responsive to the cellular redox state’. 

Functional cysteines, regardless of whether they are catalytic residues 
or sites of post-translational modification, do not conform to a canon- 
ical sequence motif, which complicates their systematic identification 
and characterization. pK, measurements can identify cysteine residues 
with heightened nucleophilicity (or ‘hyper-reactive’ cysteines®”), but 


this requires purified protein and detailed kinetic and mutagenic 
experiments’”® that cannot be performed on a proteome-wide scale. 
Additional methods have been introduced to computationally predict 
redox-active cysteines’, identify cysteines with specific modifications”, 
and qualitatively inventory electrophile-modified cysteines in pro- 
teomes'*"'*. Some of these studies have provided suggestive evidence 
that nucleophilic cysteines may possess a variety of important func- 
tions’*"*, although the non-quantitative methods used in each case 
precluded a robust and systematic evaluation of this potential relation- 
ship. We adopted a different strategy to globally characterize cysteine 
functionality in proteomes based on quantitative reactivity profiling 
with isotopically labelled, small-molecule electrophiles. 


Quantifying cysteine reactivity in proteomes 
Our approach, termed isoTOP-ABPP (isotopic tandem orthogonal 
proteolysis—activity-based protein profiling), has four features to 
enable quantitative analysis of native cysteine reactivity (Fig. 1a): (1) 
an electrophilic iodoacetamide (IA) probe, to label cysteine residues in 
proteins, that also has (2) an alkyne handle for ‘click chemistry’ con- 
jugation of probe-labelled proteins’ to (3) an azide-functionalized 
TEV-protease recognition peptide containing a biotin group for strep- 
tavidin enrichment of probe-labelled proteins”, and (4) an isotopically 
labelled valine for quantitative mass spectrometry (MS) measurements 
of IA-labelled peptides across multiple proteomes (Supplementary 
Fig. 1). After tandem on-bead proteolytic digestions with trypsin 
and TEV protease’*”, probe-labelled peptides attached to isotopic 
tags are released and analysed by liquid-chromatography-high- 
resolution MS to identify I[A-modified cysteines and quantify their 
extent of labelling based on MS2 and MS1 profiles, respectively. An 
isoTOP-ABPP ratio, R, is generated for each identified cysteine that 
reflects the difference in signal intensity between light and heavy tag- 
conjugated proteomes. 

We first verified the accuracy of isoTOP-ABPP by labelling varying 
amounts of a mouse liver proteome (1X, 2X, 4X) with the IA probe 
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Figure 1 | A quantitative approach to globally profile cysteine reactivity in 
proteomes. a, isoTOP-ABPP involves proteome labelling, click-chemistry- 
based incorporation of isotopically labelled cleavable tags, and sequential on- 
bead protease digestions to provide probe-labelled peptides for MS analysis. 
The IA probe is shown in the inset. LC-MS/MS, liquid-chromatography-MS/ 
MS. b, Measured isoTOP-ABPP ratios for peptides from MCE7 cells labelled 
with four pairwise IA probe concentrations (10:10 uM, 20:10 LM, 50:10 uM, 
100:10 1M). The blue box highlights peptides with low isoTOP-ABPP ratios 
(R < 2.0). Chromatographs for creatine kinase B (CKB; low ratio) and plastin 2 


followed by click chemistry conjugation with either the heavy or light 
variants of the azide-TEV-biotin tag. The observed signals for labelled 
cysteines closely matched the expected proteome ratios (Rj. ~ 1, 
Ry. © 2, or Ry, ~ 4, respectively; Supplementary Fig. 2). A represent- 
ative MS/MS profile of an IA-labelled peptide from our proteomic 
experiments is provided in Supplementary Fig. 3. 

In contrast to traditional cysteine-alkylating protocols for proteo- 
mics that use millimolar concentrations of IA to stoichiometrically 
modify all cysteines in denatured proteins’, we proposed that, by 
applying low (micromolar) concentrations of the IA probe to native 
proteomes, differences in the extent of alkylation would reflect differ- 
ences in cysteine reactivity, rather than abundance. This hypothesis 
predicts that the reactivity of cysteines can be measured ona proteome- 
wide scale in isoTOP-ABPP experiments that compare low versus high 
concentrations of IA probe, where hyper-reactive cysteines would be 
expected to label to completion at low probe concentrations (generat- 
ing isoTOP-ABPP ratios with Rjnigh)-flow)~ 1) and less reactive 
cysteines should show concentration-dependent increases in [A-probe 
labelling (generating isoTOP-ABPP ratios with Rjnigh):jlow) > 1) 
(Supplementary Fig. 4). We tested this idea by performing four parallel 
isoTOP-ABPP experiments with the soluble proteome of the human 
breast cancer cell line MCF7 using pair-wise [A-probe concentrations 
of 10:10 uM, 20:10 LM, 50:10 1M and 100:10 uM (light:heavy). More 
than 800 probe-labelled cysteines were identified on 522 proteins, the 
vast majority of which exhibited escalating isoTOP-ABPP ratios 
(Fig. 1b) expected for reactions that did not reach completion over 
the tested probe concentration range. In contrast, a small subset of 
cysteines (<10%) showed nearly identical ratios at all probe concen- 
trations tested (Ry. ~ Ro ~ Rs. ~ Rio ~ 1, Fig. 1b, shaded blue 
box). An expanded analysis of multiple human cancer line (Sup- 
plementary Fig. 5 and Supplementary Table 1) and mouse tissue 
(Supplementary Fig. 6 and Supplementary Table 2) proteomes treated 
with low (10M) and high (100 11M) IA-probe concentrations 
revealed consistent isoTOP-ABPP ratios for individual cysteine resi- 
dues, indicating that the propensity of a cysteine to display high IA 
reactivity is an intrinsic property of the residue (and presumably its 
local protein environment), and not, in general, contingent on features 
specific to a particular cell or tissue. Additionally, isoTOP-ABPP 
ratios showed no correlation with either protein abundance or pep- 
tide ion intensity (Supplementary Fig. 7), indicating that they were 
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(LCP1; high ratio) are shown, with elution profiles for heavy- and light-labelled 
peptides in blue and red, respectively, and green lines depicting peak 
boundaries used for quantification. Isotopic envelopes are shown for light- and 
heavy-labelled peptides with green lines representing predicted values. 
Sequences are shown for tryptic peptides containing IA-probe-labelled 
cysteines (marked by asterisks) in CKB and LCP1. RT, retention time. 
Additional chromatographs from isoTOP-ABPP experiments are in 
Supplementary Table 7. 


independent of potential MS-based ionization sources for saturation. 
Finally, we confirmed that similar isoTOP-ABPP ratios were obtained 
for cysteines in reactions where time rather than the concentration of 
probe was varied (Supplementary Fig. 8 and Supplementary Table 3), 
confirming that lowisoTOP-ABPP ratios reflect rapid reaction kinetics 
(hyper-reactivity), rather than saturable binding interactions (see 
Supplementary Discussion). 


Hyper-reactivity predicts cysteine functionality 

We next sought to assess the functional ramifications of the special 
subset of cysteines that showed hyper-reactivity in isoTOP-ABPP 
experiments. We first noted that multiple sites of IA-probe labelling 
on the same protein often showed markedly different isoTOP-ABPP 
ratios. For example, the glutathione S-transferase GSTO1 was labelled 
on four cysteine residues, three of which showed high ratios (C90, 
C192 and C237 had ratios of Rio.; = 5.6, 7, and 5.4, respectively), 
whereas the fourth (C32) showed a low ratio of Rjo.1 = 0.9 (Fig. 2a). 
Interestingly, C32 is the active-site nucleophile of GSTO1 (ref. 22). 
Acetyl-CoA acetyltransferase-1 (ACAT1) was also labelled on four 
cysteines and three showed high ratios (C119, C196 and C413 showed 
ratios of Rjo.; = 8.8, 8.2 and 4, respectively), whereas the fourth, the 
active site nucleophile C126 (ref. 23), yielded a low ratio of Rio.) = 1.1 
(Fig. 2a). 

The aforementioned findings indicated that heightened IA reactivity 
might be a good predictor of cysteine functionality in proteins. To 
examine this premise more systematically, we queried the Universal 
Protein Resource (UniProt) database to retrieve functional annota- 
tions for the 1,082 cysteine residues labelled by the IA probe. This 
analysis revealed that the most hyper-reactive cysteines were remark- 
ably enriched in functional residues, with 35% of the cysteines with 
Rio. <2 being annotated as active-site nucleophiles or redox-active 
disulphides compared to 0.2% for all cysteine residues in the UniProt 
database (Fig. 2b, c, Supplementary Fig. 9 and Supplementary Tables 4 
and 5). Hyper-reactive cysteines were also, as a group, more conserved 
across eukaryotic evolution (Supplementary Fig. 10). A broader survey 
of hyper-reactive cysteines identified several that have been ascribed 
functional properties in the literature despite lacking annotation in 
UniProt (Supplementary Fig. 11). For example, a single hyper-reactive 
cysteine C108 (Ryo. = 1.0) was identified in the uncharacterized 
protein D15Wsu75e. This protein and its orthologues are predicted 


9 DECEMBER 2010 | VOL 468 | NATURE | 791 


©2010 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


[ey 
a b 4.0% 5.2% 
GSTO1 
2.3% 
Rios = 0.9 56 7.0 5.4 Ry94 < 2.0 2.0 < Ry9.4 < 5.0 
32 c90 c192 237 ‘ : : 
ACAT1 
Ryo.4 > 5.0 All cysteines 
Rios = 881 8.2 4.0 ae 
saa - : : lB Annotated active site + redox-active disulphide 
= ® Annotated structural disulphide and others 
C11 C126C142 C196 C413 ¢ A 
* © No functional annotation 
Cc 7 U 
H % © 
14 i Labelled cysteine : 35% 6 
12 4 Annotated active-site nucleophile 130% & 
= i or redox-active disulphide F oe 
£4107), ~-- Moving average 25% § 
a ot oe. & 
a \ 20% & 
< B 
a. 15% 2 
8 < 
5 10% 8 
DS 
a 
2 +5% a 
0% & 


Figure 2 | Hyper-reactive cysteines are highly enriched in functional 
residues. a, Chromatographs from an isoTOP-ABPP experiment using 
100:10 j1M IA probe are shown for peptides from GSTO1 (top) and ACAT1 
(bottom). The cysteine nucleophiles (asterisks) show low ratios (Ryo.1 ~ 1), 
whereas other cysteines show high ratios (Rjo., = 4). b, Pie charts illustrating 
the percentage of functionally annotated cysteines for three isoTOP-ABPP 
ratio ranges, including an average derived from all cysteines in the UniProt 


to be cysteine proteases based on conservation of a prototypical Cys- 
His catalytic dyad”. Interestingly, C108 corresponds to the putative 
cysteine nucleophile of this catalytic motif and a recent crystal struc- 
ture confirms the proximity of C108 to a conserved histidine (H38) 
(Supplementary Fig. 12). Thus, quantitative reactivity profiling sup- 
ports structural predictions that D15Wsu75e is a functional cysteine 
protease. 

Hyper-reactive cysteines also corresponded to sites for post- 
translational modification. For instance, C101 (Rjo.; = 1.92) in the 
protein arginine methyltransferase PRMT1 has been identified as a 
site of modification by the endogenous oxidative product 4-hydroxy- 
2-nonenal (HNE)**. This cysteine, although nonessential for catalytic 
function, is an active site residue that makes direct contact with the 
S-adenosylmethionine cofactor”® (Fig. 3a). Interestingly, we found that 
HNE inhibited both the IA-labelling (Fig. 3b) and catalytic activity 
(Fig. 3c) of wild-type PRMT1. A C101A mutant of PRMT1 showed 
substantially reduced IA-labelling (Fig. 3b) and HNE sensitivity 
(Fig. 3c). These data indicate that PRMT1 may be regulated by oxidative 
stress pathways through selective HNE modification of its hyper- 
reactive, active-site C101 residue. Additional hyper-reactive cysteines 
represented sites for glutathionylation”” (CLIC1 (C24), CLIC3 (C25) 
and CLIC4 (C35); Ryo; = 2.02, 1.07 and 1.45, respectively) and nitro- 
sylation”®® (RTN3; C42, Rjo.; = 0.78). These data, taken together, indi- 
cate that heightened reactivity is not only a feature of catalytic cysteines, 
but also of ‘non-catalytic’, active-site cysteines, as well as those that 
undergo various forms of oxidative modification. 


Function of the hyper-reactive cysteine in FAM96B 


Intrigued by the diverse functional properties showed by hyper-reactive 
cysteines, we reasoned that critical activities might be inferred for such 
residues in hitherto uncharacterized proteins. A survey of the cysteines 
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database. c, Correlation of isoTOP-ABPP ratios with functional annotations 
from the UniProt database where active-site nucleophiles or redox-active 
disulphides are shown in red, and all other cysteines in black. A moving average 
(window of 50) of functional residues is shown as a dashed blue line, 
demonstrating a profound enrichment within Rj. < 2.0. Data are from 
experiments in three human cancer cell lines (MCF7, MDA-MB-231 and 
Jurkat). 


displaying low isoTOP-ABPP ratios uncovered the highly conserved 
C93 (Rio. = 1.15) in the uncharacterized protein FAM96B (Sup- 
plementary Fig. 13). FAM96B has close orthologues in many organisms 
including the YHRI122W protein from the budding yeast 
Saccharomyces cerevisiae, which shows 52% identity with human 
FAM96B, including conservation of C93 (the corresponding residue 
in YHR122W is C161). The gene encoding YHR122W is essential for 
yeast viability’, and we found that expression of wild-type YHR122W, 
but not the C161A mutant of YHR122W could rescue a yeast strain in 
which the YHR122W gene was conditionally suppressed (Fig. 4a and 
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Figure 3 | Functional characterization of the hyper-reactive cysteines in 
PRMTI. a, Crystal structure of rat PRMT1°° (green, PDB accession code 
1ORI) showing the hyper-reactive cysteine C101 in contact with an 
S-adenosylhomocysteine (SAH) cofactor (cyan). b, Wild type (WT) and C101A 
mutant of human PRMT1 were labelled with the IA probe, followed by click 
chemistry to incorporate a fluorescent rhodamine tag. In-gel fluorescence 
demonstrates robust labelling of the wild-type but not C101A mutant PRMT1, 
and shows that IA-probe labelling of wild-type PRMT1 is inhibited by HNE 
(upper panel). Lower panel shows Coomassie blue staining for treated protein 
samples. c, Catalytic activity of purified wild-type, but not C101A mutant 
PRMT1 is inhibited by HNE as measured by monitoring transfer of *H-methyl 
from H-S-adenosylmethionine (SAM) to a histone 4 substrate. 
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Figure 4 | Functional characterization of YHR122W/FAM96B. 

a, Expression of wild type and a C161A mutant of YHR122W ina yeast strain 
with a doxycycline (dox)-repressable YHR122W gene demonstrated a 
dominant-negative phenotype on induction of the C161A mutant expression 
(—dox/+ gal, middle panel) and rescue of viability by expression of wild type, 
but not the C161A mutant of YHR122W (+dox/+ gal, right panel). b, The 
cytosolic FeS cluster assembly pathway contains multiple proteins with hyper- 
reactive cysteines (in red). YHR122W/FAM96B (YHR) is a putative member of 


Supplementary Fig. 14). These data confirm the importance of C161 for 
the in vivo function of YHR122W and, by extension, other members of 
the FAM96B family. 

We also observed that expression of the C161A mutant of 
YHR122W caused a severe growth defect in non-suppressive media 
indicative ofa dominant-negative phenotype (Fig. 4a and Supplemen- 
tary Fig. 14). This result indicates that the YHR122W protein may 
engage in protein complexes that are sequestered by the C161A 
mutant, thereby disrupting the activity of the wild-type protein. 
Consistent with this premise, queries of the Saccharomyces genome 
databank (SGD) revealed that YHR122W has been found in several 
large-scale protein interaction studies to bind to proteins involved in 
cytosolic iron-sulphur (FeS) cluster assembly, namely Nar1 and Cial 
(ref. 30; Fig. 4b). We found that the activity of the FeS-client protein 
isopropylmalate isomerase (Leul)*' was markedly reduced in 
YHR122W-deleted yeast, and this reduction was substantially rescued 
by expression of the wild-type YHR122W protein (Fig. 4c). These data 
support a role for the YHR122W/FAM96B protein in FeS-protein 
biogenesis. We also note that reactive cysteines seem to be a common 
feature of proteins in the FeS-protein assembly complex, including the 
human orthologues of Narl, Met18 and Cfdl (NARF, MMS19 and 
NUBP2, respectively) (Rjo-; = 0.91, 2.2 and 2.9 respectively) (Sup- 
plementary Fig. 11), where they may assist in the transfer of assembled 
FeS clusters to client proteins”. 


Predicting functional cysteines in designed proteins 
The marked correlation between cysteine hyper-reactivity and func- 
tionality observed in native proteomes led us to ask whether this rela- 
tionship would extend to de novo designed proteins. We compared the 
IA labelling of twelve proteins that were computationally designed to 
act as cysteine hydrolases. These proteins originated from structurally 
distinct scaffolds and were all designed to contain cysteine-histidine 
dyads within an active site cavity (see Supplementary Methods for 
more details). Two of the designed proteins, ECH13 and ECH19, 
showed significant hydrolytic activity using a fluorogenic ester sub- 
strate, whereas the other ten designs were inactive (Fig. 5a and 
Supplementary Fig. 15a). 

We first evaluated IA labelling of protein designs using a clickable, 
fluorescent reporter tag and SDS-polyacrylamide gel electrophoresis 
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this network based on protein-protein interaction studies (see http:// 

www. yeastgenome.org/). This panel was adapted from ref. 30. c, Doxycycline 
treatment of the YHR122W-repressable yeast strain significantly decreased the 
activity of the cytosolic FeS enzyme Leul"', and this activity is rescued by 
overexpression of wild-type YHR122W. These treatments had no effect on the 
activity of the non-FeS enzyme alcohol dehydrogenase (ADH). Error bars 
represent standard deviation, n = 3. ***P < 0.001, Student’s t-test. 


(SDS-PAGE) analysis, where similar amounts of each protein were 
tested in a homogeneous background proteome representing a mix of 
Escherichia coli and human (MCF7 cell line) proteins. The two active 
protein designs ECH13 and ECH19 showed strong IA-labelling signals 
compared to inactive designs (Fig. 5a), and, in both cases, mutation of 
the active-site cysteine to alanine abolished labelling (Fig. 5b) and 
hydrolytic activity (data not shown). We next combined the proteomes 
containing all twelve protein designs, diluted them into a background 
human cell proteome, and analysed the mixture by isoTOP-ABPP. 
Notably, both ECH13 and ECH19 showed isoTOP-ABPP ratios that 
were equivalent to the most hyper-reactive cysteines in human and 
E. coli proteomes (Rjo.1 = 0.92 and 1.27, respectively), whereas the 
remaining inactive protein designs all showed higher ratios ranging 
from 1.88-6.11 (Fig. 5c and Supplementary Fig. 15b, c). These data 
thus reveal a strong correlation between cysteine hyper-reactivity and 
hydrolytic activity across a diverse panel of protein designs and 
designate heightened cysteine nucleophilicity as a key feature of suc- 
cessful cysteine hydrolase designs. 


Conclusions 


Here, we have described a quantitative method to profile the intrinsic 
reactivity of cysteine residues in native proteomes. Measurement of the 
rate of alkylation by IA (or other carbon electrophiles) has been used by 
enzymologists to assess the nucleophilicity of cysteine residues in indi- 
vidual, purified proteins®. With isoTOP-ABPP, these studies can now be 
extended to quantitative, proteome-wide surveys of cysteine reactivity 
in complex biological systems. A key advantage of isoTOP-ABPP over 
more traditional proteomic methods that target cysteine-containing 
peptides’*’* is the use of an alkynylated IA probe in place of more bulky 
biotinylated reagents, which have shown an impaired ability to label 
cysteines in native proteins’*. Alkynylated IA probes, owing to their cell 
permeability, also afford the opportunity to perform cysteine reactivity 
profiling in living systems. In pilot experiments, we have found that a 
large fraction of hyper-reactive cysteines are labelled by the IA probe in 
living cells (Supplementary Fig. 16). Furthermore, isoTOP-ABPP selec- 
tively targets probe-accessible cysteines in native proteins. In this way, 
structural cysteines engaged in disulphide bonds or buried within the 
body ofa protein are avoided to provide preferential access to a specific 
fraction of cysteines that are profoundly enriched in functionality (the 
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Figure 5 | Quantitative reactivity profiling predicts functional cysteines in 
designed proteins. a, In-gel fluorescence demonstrates robust IA labelling of 
two active cysteine hydrolases, ECH13 and ECH14, relative to inactive designs 
(top panel). Hydrolysis activities of ECH13 and ECH19 measured as the ratio of 
velocities in the presence versus the absence of purified enzymes were 

71.64 + 6.94 and 104.15 + 10.78, respectively (see Supplementary Fig. 15a for 
substrate hydrolysis assay). Other designs showed no measurable hydrolysis 


IA probe labelled 1,082 out of a total of 8,910 cysteines present on the 
890 human proteins detected in this study). Projecting forward, it is 
possible that, by varying the nature of the electrophile, isoTOP-ABPP 
probes can be created that profile the reactivity of different subsets of 
cysteines, as well as other amino acids in proteomes, such as serine, 
threonine, tyrosine and glutamate/aspartate, which have also been 
shown to react with small-molecule probes'*'***». 

We discovered that hyper-reactivity can predict cysteine function 
in both native and designed proteins. The fact that hyper-reactivity 
was strongly correlated with catalytic activity in de novo designed 
cysteine hydrolases is interesting from the principles of both enzyme 
engineering and assay development, as it indicates that heightened 
cysteine nucleophilicity is a key feature of active catalysts and, accord- 
ingly, electrophile reactivity could serve as an effective primary screen 
for novel cysteine-dependent enzymes. We show that these screens 
can be performed directly in complex proteomes using either gel or 
MS (isoTOP-ABPP) detection platforms, thus offering a versatile and 
relatively high-throughput way to evaluate many protein designs in 
parallel. The isoTOP-ABPP platform has the additional advantage of 
reading out the relative cysteine reactivity of designs independent of 
their expression levels against a ‘background’ of native, hyper-reactive 
cysteines for comparison. isoTOP-ABPP might also offer a com- 
plementary way to perform cysteine reactivity/accessibility experi- 
ments that monitor protein stability and ligand interactions***’. 

The relationship between cysteine reactivity and functionality extends 
beyond nucleophilic catalysis to include other enzymatic activities 
(oxidative/reductive), as well as sites of electrophilic and oxidative modi- 
fication. Quantitative reactivity profiling thus distinguishes itself as a 
complementary and perhaps more inclusive strategy to survey cysteine 
function compared to previous computational’ and experimental’ '*”” 
methods that focus on specific cysteine-based activities or modification 
events. Considering further that hyper-reactive cysteines corresponded 
to sites for glutathionylation”’, nitrosylation’** and HNE-modification”*, 
we speculate that cysteine nucleophilicity is a property that may have 
been selected for during evolution to offer points of protein control by 
oxidative stress pathways. Determining how the reactivity of cysteine 
residues is honed will require further investigation, but we anticipate 
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activity over background (0.76 + 0.058 nmols_'). Asterisks designate 
Coomassie blue signals for protein designs (lower panel). b, IA labelling is 
observed for ECH13 and ECH19, but not their active-site cysteine mutants 
C45A and C1614, respectively. c, Catalytic cysteines in ECH13 and ECH19 
show low isoTOP-ABPP ratios (red) compared with other designs (blue). 
Chromatographs are shown for peptides from the nine designs identified in this 
experiment (bottom panel), in the same order as shown in the top panel. 


that quantitative proteomic data, when integrated with the output of 
ongoing structural genomics programs, may eventually uncover unifying 
mechanistic principles that explain cysteine reactivity in proteins. In this 
regard, it is interesting to note that, although hyper-reactive cysteines did 
not conform to any obvious consensus sequence motifs, many of these 
residues were found at the N termini of a-helices (Supplementary Fig. 
17). This finding is consistent with literature reports ascribing a role 
for o-helix dipoles in the stabilization of cysteine thiolate anions”. 

Finally, it is important to stress that some functional cysteines may 
be inherently reactive, but inaccessible to our IA probe for steric 
reasons. Other cysteine-reactive electrophilic probes’®’” may prove 
more suitable for such cysteine residues. Also, hyper-reactivity is not 
necessarily a defining feature for all functional cysteines. Some 
enzymes with catalytic cysteines may, for instance, show reduced 
reactivity until they bind their physiological substrates or may rely 
more on substrate recognition than inherent catalytic power for func- 
tion. This may be the case with the El-activating and E2-conjugating 
enzymes, which recognize a specific class of ubiquitinated substrates 
and possess active-site cysteines that showed only moderate levels of 
electrophile reactivity (Supplementary Fig. 18). Other cysteines may 
have activities that are not dependent on their nucleophilicity. Our data 
do indicate, however, that those cysteines that are hyper-reactive in 
proteomes probably perform important catalytic and/or regulatory 
functions for their parent proteins. The large number of newly dis- 
covered residues that fall into this category foretell a broad role for 
hyper-reactive cysteines in mammalian biology. 


METHODS SUMMARY 

Probes and tags. The IA probe and the light and heavy variants of the azide-TEV- 
biotin tags were synthesized as previously described”. 

Sample preparation, mass spectrometry and data analysis. For concentration- 
dependent experiments, proteome samples in PBS were probe labelled with the 
desired probe concentration for 1h. Click chemistry was performed with either 
the light or heavy variants of the azide-TEV-biotin tags and the samples were 
mixed and subjected to streptavidin enrichment and subsequent trypsin and TEV 
digestion. The resulting TEV digests were analysed by Multidimensional Protein 
Identification Technology (MudPIT) on an LTQ-Orbitrap instrument. The 
resulting tandem MS data were searched using the SEQUEST algorithm” using 
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a concatenated target/decoy variant of the human, mouse and E. coli protein 
sequence databases. Quantification of light:heavy ratios (isoTOP-ABPP ratios, 
R) was performed using in-house software. Detailed information on sample 
preparation, mass spectrometry methods and data analysis is presented in 
Methods. 

Complementation of S. cerevisiae YHR122W deletion mutant. Complementary 
DNA encoding wild-type YHR122W was subcloned into the pESC_Leu vector 
(Stratagene). The YHR122W(C161A) mutant was generated using the Quickchange 
procedure (Stratagene). These constructs were introduced into a yeast Tet pro- 
moter Hughes (yTHC) strain harbouring a conditional (doxycycline-dependent) 
disruption in the YHR122W gene (Open Biosystems). Growth of these trans- 
formed cell lines on + gal/+dox media was monitored for 3 days. These cell lines 
were also used to monitor Leul and alcohol dehydrogenase (ADH) activity. 
Detailed information on the protocols used to subclone, transform and monitor 
the growth of the yeast strains and measure enzyme activity is available in 
Methods. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


All compounds and reagents were purchased from Novabiochem, Sigma or 
Fisher, except where noted. 

Preparation of mouse proteomes. Mouse tissues (heart and liver) were harvested 
and immediately flash frozen in liquid nitrogen. The tissues were then Dounce 
homogenized in 1X PBS, pH 7.4. Centrifugation at 100,000g (45 min) provided 
soluble fractions (supernatant) and membrane fractions (pellet). Protein concen- 
trations for each proteome were obtained using the Bio-Rad DC protein assay and 
stored at —80°C till use. 

Preparation of human cancer cell line proteomes. MDA-MB-231 cells were 
grown in L15 media supplemented with 10% fetal bovine serum at 37 °C ina CO>- 
free incubator. Jurkat cells and MCE7 cells were grown in RPMI-1640 supple- 
mented with 10% fetal bovine serum at 37 °C with 5% CO). For in vitro labelling 
experiments, cells were grown to 100% confluency, washed three times with PBS 
and scraped in cold PBS. Cell pellets were isolated by centrifugation at 1,400g for 
3 min, and the cell pellets stored at —80 °C until further use. For in situ labelling of 
MDA-MB-231 and MCE7 cells, the cells were grown to 90% confluency, the 
media was removed and replaced with fresh media containing 10 uM IA probe. 
The cells were incubated at 37 °C for 1h and harvested as detailed above. The 
harvested cell pellets were lysed by sonication and fractionated by centrifugation 
(100,000g, 45 min) to yield soluble and membrane proteomes. The proteomes 
were diluted to 2mg ml‘ and stored at —80 °C until use. 

Protein labelling and click chemistry. Proteome samples were diluted to a 2 mg 
protein/ml solution in PBS. Each sample (2 X 0.5 ml aliquots) was treated with 10, 
20, 50, or 100 LM of IA probe using 5 pl ofa 1, 2, 5, or 10 mM stock in DMSO. The 
labelling reactions were incubated at room temperature (25 C) for Lh. Click 
chemistry was performed by the addition of 150 LM of either the light TEV tag 
or heavy TEV tag (15 pl of a5 mM stock), 1 mM tris(2-carboxyethyl)phosphine 
(TCEP; fresh 50X stock in water), 100 1M ligand (17 stock in DMSO:t-butanol 
1:4) and 1mM CuSO, (50 stock in water). Samples were allowed to react at 
room temperature for 1h. After the click chemistry step, the light- and heavy- 
labelled samples were mixed together and centrifuged (5,900g, 4 min, 4°C) to 
pellet the precipitated proteins. The pellets were washed twice in cold MeOH, 
after which the pellet was solubilized in PBS containing 1.2% SDS via sonication 
and heating (5 min, 80 °C). 

For time course experiments, proteome samples were labelled with 100 1M of 
IA probe (using 5 1] of a 10 mM stock in DMSO). After 6 min of probe labelling, 
an aliquot of the reaction was quenched by passaging the sample through a NAP- 
5 column (GE Healthcare) to remove excess, unreacted probe. After 60 min of 
probe labelling, the other sample was quenched as before and click chemistry was 
performed as described earlier. 

Streptavidin enrichment of probe-labelled proteins. The SDS-solubilized, probe- 
labelled proteome samples were diluted with 5 ml of PBS for a final SDS concentration 
of 0.2%. The solutions were then incubated with 100 pil of streptavidin-agarose beads 
(Pierce) for 3h at room temperature. The beads were washed with 10 ml 0.2% SDS/ 
PBS, 3 X 10 ml PBS and 3 X 10 ml H,O and the beads were pelleted by centrifugation 
(1,300g, 2 min) between washes. 

On-bead trypsin and TEV digestion. The washed beads described earlier were 
suspended in 500 pl of 6 M urea/PBS and 10 mM TCEP (from 20% stock in H2O) 
and placed in a 65 °C heat block for 15 min. Twenty millimolar iodoacetamide 
(from 50X stock in HO) was then added and allowed to react at 37 °C for 30 min. 
Following reduction and alkylation, the beads were pelleted by centrifugation 
(1,300g, 2min) and resuspended in 200,11 of 2M urea/PBS, 1mM CaCl, 
(100 stock in H,O), and trypsin (2 1g). The digestion was allowed to proceed 
overnight at 37 °C. The digest was separated from the beads using a Micro Bio- 
Spin column and the beads were then washed with 3 X 500 ul PBS, 3 x 500 pl 
H,0, and 1 X 150 pl of TEV digest buffer. The washed beads were then resus- 
pended in 150 pil of TEV digest buffer with AcTEV Protease (Invitrogen, 5 jl) for 
12h at 29°C. The eluted peptides were separated from the beads using a Micro 
Bio-Spin column and the beads washed with H2O (2 X 75 ul). Formic acid (15 tl) 
was added to the sample, which was stored at —20 °C until MS analysis. 
Liquid-chromatography-mass-spectrometry (LC-MS) analysis. LC-MS/MS 
analysis was performed on an LTQ-Orbitrap mass spectrometer (ThermoFisher) 
coupled to an Agilent 1100 series high-performance liquid chromatography system. 
TEV digests were pressure loaded onto a 250 um fused silica desalting column 
packed with 4cm of Aqua C18 reverse phase resin (Phenomenex). The peptides 
were then eluted onto a biphasic column (100 um fused silica with a 5 jm tip, 
packed with 10 cm C18 and 3 cm Partisphere strong cation exchange resin (SCX, 
Whatman) using a gradient 5-100% buffer B in buffer A (buffer A: 95% water, 5% 
acetonitrile, 0.1% formic acid; buffer B: 20% water, 80% acetonitrile, 0.1% formic 
acid). The peptides were then eluted from the SCX onto the C18 resin and into the 
mass spectrometer using four salt steps as previously described’*”°. The flow rate 


through the column was set to ~0.25 tl min ' and the spray voltage was set to 
2.75kV. One full MS scan (FTMS) (400-1,800 MW) was followed by 18 data 
dependent scans (ITMS) of the nth most intense ions with dynamic exclusion 
disabled. 

Peptide identification. The tandem MS data were searched using the SEQUEST 
algorithm” using a concatenated target/decoy variant of the human and mouse 
International Protein Index databases. A static modification of +57.02146 on 
cysteine was specified to account for iodoacetamide alkylation and differential 
modifications of +464.28596 (light probe modification) and +470.29977 (heavy 
probe modification) were specified on cysteine to account for probe modifica- 
tions with the either light or heavy variants of the [A-probe-TEV adduct. 
SEQUEST output files were filtered using DTASelect 2.0. Reported peptides 
were required to be fully tryptic and contain the desired probe modification and 
discriminant analyses were performed to achieve a peptide false-positive rate 
below 5%. The actual false-positive rate was assessed at this stage according to 
established guidelines* and found to be ~3.5%. Additional assessments of the 
false-positive rate were performed following the application of additional filters 
(described later) resulting in a final false-positive rate below 0.05%. 

Ratio quantification. Quantification of light/heavy ratios (isoTOP-ABPP ratios, 
R) was performed using in-house software written in the R programming language 
that utilizes routines from the open-source XCMS package“ for MS data analysis 
to read in raw chromatographic data in the mzXML format*’. Each experiment 
consisted of two LC/LC-MS/MS runs: light:heavy 10 .M:10 1M, and light:heavy 
100 1M:10 1M TA-probe concentration. Both runs were searched using SEQUEST 
and filtered with DTASelect as described earlier. Because the mass spectrometer 
was configured for data-dependant fragmentation, peptides are not always iden- 
tified in every run. As such, peptides were identified in either 1) only the 
10 M:10 [tM run, 2) only the 100 1.M:10 1M run, or 3) both runs. In the case of 
peptides that were sequenced in both runs, identification of the corresponding 
peaks was made by choosing peaks that co-elute with the peptide identification. In 
the case of probe-modified peptides that were sequenced in one, but not the other 
run, an algorithm was developed to identify the corresponding peak in the run 
without the SEQUEST identification. To accomplish this, the retention time of the 
‘reference’ peptide is used to position a retention time window (+ 10 min) across 
the run lacking a peptide identification. Extracted ion chromatograms (+ 10 
p-p.m.) of the target peptide m/z with both ‘light’ and ‘heavy’ modifications are 
generated within that window. The program then searches for candidate co-eluting 
pairs of light:heavy MS1 peaks, and for each candidate pair calculates the ratio of 
integrated peak area between the light and heavy peaks. Several filters are used to 
ensure that the correct peak pair is identified. First, the extent of co-elution for each 
peak pair is quantified using a Pearson correlation, an established method to gauge 
elution profile similarity**. Second, the predicted pattern of the isotopic envelope of 
the target peptide is generated and compared to the observed high-resolution MS1 
spectrum. This comparison generates an ‘envelope correlation score’ (Env) that 
also enables confirmation of the monoisotopic mass and charge state of each 
candidate peak. Peak pairs that have poor co-elution scores, or that have the 
incorrect monoisotopic mass or charge, or whose isotopic envelopes are not well 
correlated with the predicted envelope are eliminated from consideration. After 
application of these filters, in the rare case that multiple candidates still exist, then 
no peak is chosen anda ratio is not recorded. Usually, however, application of these 
filters results in a single candidate peak pair and the ratio for this peak pair is 
recorded for the peptide in the corresponding run. In this way, each experiment 
yields two ratios, one for the 10 }1M:10 1M run and one for the 100 [1M:10 uM run. 
Following application of these filters, the false-positive rate was reassessed, and 
found to be less than 0.05% in all cases. 

After ratios for unique peptide entries are calculated for each experiment, 
overlapping peptides with the same labelled cysteine (for example, same local 
sequence around the labelled cysteines but different charge states, MudPIT seg- 
ment numbers, or tryptic termini) are grouped together, and the median ratio 
from each group is reported as the final ratio (R). All of these values can be found 
in Supplementary Tables 1, 2 and 3 and representative chromatographs can be 
seen in Supplementary Table 7. Raw result files of peptide identification using 
SEQUEST can be found in Supplementary Table 9. 

Functional annotation of labelled cysteines. For automated functional analyses, 
custom perl-scripts were developed to query the UniProtKB/Swiss-Prot Protein 
Knowledgebase release 57.4 (current as of 16 June 2009). Sequence annotation in 
the (Features) section of the relevant UniProt entry was mined and any annota- 
tion corresponding to the labelled residue was collected. This functional annota- 
tion in its entirety can be found in Supplementary Tables 4 and 5. 

Recombinant PRMT1 protein expression and purification. Full-length cDNA 
encoding human PRMT1 in pOTB7 was purchased from Open BioSystems and 
subcloned into pET-45b(+) (Novagen). BL21(DE3) E. coli containing this vector 
was grown in LB media containing 75 mg1 * carbenicillin with shaking at 37 °C 
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to an OD¢oonm Of 0.5. The cells were then induced with 1 mM isopropyl-f-p- 
thiogalactoside (IPTG) and harvested 4h later by centrifugation. Cells were lysed 
by stirring for 20 min at 4 °C in 50 mM Tris-HCl (pH 8.0) with 150 mM NaCl and 
supplemented with 1 mg ml! lysozyme and 1 mg ml! DNase I. The lysate was 
then sonicated and centrifuged at 10,000g for 10 min. Talon cobalt affinity resin 
(Clontech; 400 jl of slurry per gram of cell paste) was added to the supernatant, 
and the mixture was rotated at 25°C for 30 min. Beads were collected by cent- 
rifugation at 700g for 3 min, washed twice with Tris buffer, and applied to a 1-cm 
column. The column was washed twice with Tris buffer (10 ml per 400 ul of resin 
slurry) and Tris buffer with 500 mM NaCl once. The bound protein was eluted by 
the addition of 100mM imidazole (2 ml per 400 ul of resin). Imidazole was 
removed by passage over a Sephadex G-25M column (GE Healthcare), and the 
eluate was concentrated using an Amicon centrifugal filter device (Millipore). 
Protein concentration was determined using the Bio-Rad DC protein assay kit. 
These conditions yielded PRMT1 at approximately 0.5mgl~' of culture. A 
C101A mutation was introduced into the pET-45b(+) construct described earlier 
using the Quikchange Site-Directed Mutagenesis Kit (Stratagene), and the result- 
ing mutant protein was expressed identically and isolated with a similar yield. 
In-gel fluorescence characterization of PRMT1. Thirteen micrograms of 
recombinant PRMT1 (wild type or C101A mutant) in 501 PBS buffer was 
pre-incubated with 0, 25 or 50 uM HNE (Calbiochem, 50 mM stock in ethanol) 
for 1h at room temperature and was then labelled with 100 nM of the IA probe 
(5 UM stock in DMSO) and the reactions incubated for 1 h at room temperature. 
Click chemistry was performed with 201M rhodamine-azide, 1mM TCEP, 
100 4M TBTA ligand and 1mM CuSQy,. The reaction was allowed to proceed 
at room temperature for 1h before quenching with 50 ul of 2x SDS-PAGE 
loading buffer (reducing). Quenched reactions were separated by SDS-PAGE 
(30 ul of sample/lane) and visualized in-gel using a Hitachi FMBio Ile flatbed 
laser-induced fluorescence scanner (MiraiBio). 

PRMT1 in vitro methylation assays. Five-hundred nanograms of recombinant 
human PRMT1 (wild type or C101A mutant) was pre-incubated with HNE 
(Calbiochem) for 30 min and methylation activity was monitored after addition 
of 1 mg of recombinant histone 4 (M2504S; NEB) and SAM (2 Ci) in methyla- 
tion buffer (20 mM Tris, pH 8.0, 200 mM NaCl, 0.4mM EDTA). Reactions were 
incubated for 90 min at 30°C and stopped with SDS sample buffer. SDS-PAGE 
gels were fixed with 10% acetic acid/10% methanol v/v, washed, and incubated 
with Amplify reagent (Amersham) before exposing at —80 °C. 
Complementation of S. cerevisiae YHR122W deletion mutant. A cDNA 
encoding YHR122W was purchased as a full-length expressed sequence tag 
(Open Biosystems). The construct for subcloning into the yeast epitope tagging 
vector pESC-Leu (Stratagene) was generated by polymerase chain reaction (PCR) 
from the corresponding cDNA using the following primers: sense primer, 
5'-GAAGCGGCCGCAATGTCTGAGTTTTTGAATGA-3’; antisense primer, 
5'- CCGACTAGTGCCTTACAAGTCACTAACATCTTAG-3’. 

The PCR product was digested with NotI-Spel and subcloned into a NotI-Spel- 
digested pESC-Leu vector and sequenced. The YHR122W(C161A) mutant was 
generated using the Quickchange procedure (Stratagene). The mutant cDNA was 
sequenced and found to contain only the desired mutation. 

Constructs containing wild-type and C161A mutant YHR122W were intro- 
duced into the yTHC strain YSC1180-7428770 (Open Biosystems) using the 
reagents provided in the Yeastmaker Yeast Transformation System 2 (Clontech). 
The yeast was grown in synthetic dextrose minimal medium (—Leu) and spot 
assays were performed in either synthetic dextrose minimal medium (—Leu) or 
synthetic galactose minimal medium (—Leu) + agar plates + 50 ugml | doxycy- 
cline. The plates were cultured at 30 °C for 3 days. 

Isopropylmalate isomerase (Leul) assay. Yeast strains harbouring either an 
empty vector or wild-type YHR122W (see earlier section) were cultured in syn- 
thetic dextrose minimal medium (—Leu) to an OD¢00 nm Of 1.0 and transferred into 
synthetic galactose minimal medium (—Leu) + 50 pg ml! doxycycline for 12h. 
Yeast were lysed and Leul semi-purified by ammonium sulphate precipitation 
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(40-70%). The activity assays were performed using DL-threo-3-isopropylmalic 
acid as the substrate and product formation was measured by monitoring absor- 
bance at 235 nm for 10 min”. 

ADH assay. Yeast cell lysates in 0.1. M sodium pyrophosphate buffer (pH 9.2, 
1.5 ml) were treated with 2 M ethanol (0.5 ml) and 0.025 M NAD (1.0 ml) and 
ADH activity was measured by absorbance increase at 340 nm for 3 min”. 

De novo designs of cysteine hydrolases and hydrolysis activity assays. We used 
the Rosetta computational enzyme design methodology® to search a set of 
protein scaffolds for constellations of backbones capable of supporting an idea- 
lized transition state for ester hydrolysis derived from the geometries and 
mechanisms of natural cysteine hydrolases’. The idealized active-site models 
feature a nucleophilic cysteine, a general base/acid histidine and at least one 
side-chain or backbone hydrogen bond donor as the oxyanion hole. The sequence 
of residues surrounding the putative active sites was optimized using the Rosetta 
design algorithm to maximize transition state stabilization”®. A set of 12 designed 
proteins in 10 distinct scaffolds was chosen for experimental characterization. For 
each designed protein, synthetic genes were obtained and protein expression and 
purification was performed in E. coli as previously described”’. Activity was 
measured with the substrate by following the initial (<5% substrate conversion) 
increase in fluorescence due to the appearance of the product coumarin. A protein 
concentration of 201M and substrate concentration of 100 1M were used in 
25 mM HEPES buffer, 150 mM NaCl, 1 mM TCEP, pH 7.5. The background rate 
was measured under identical conditions but without the protein. Kunkel muta- 
genesis was used for creating point mutations in the active-site residues. A 
detailed description of the design and characterization of the cysteine hydrolases 
will be presented elsewhere. Amino acid sequences of the 12 designs can be found 
in Supplementary Information. 

In-gel fluorescence and isoTOP-ABPP characterization of designed proteins. 
For in-gel fluorescence studies, E.coli lysates overexpressing the designed proteins 
were diluted to 2 mg protein/ml in PBS. Each sample (25 ul) was mixed with 25 pl 
of MCF7 human cell soluble proteome (2 mg ml~ 1) and was labelled with 100 nM 
of the IA probe (5 1M stock in DMSO) and the reactions incubated for 1h at 
room temperature. Click chemistry, SDS-PAGE separation and in-gel fluor- 
escence visualization were performed as described in previous sections. 

For isoTOP-ABPP studies, 10 ul of each of the E.coli lysates (2 mg protein/ml) 
overexpressing the designed constructs were mixed together and the total volume 
was brought to 1 ml by the addition of 2mg ml‘ of MCEF7 soluble proteome. 
Time-dependent and concentration-dependent labelling with the IA probe, click 
chemistry, on-bead trypsin and TEV digestions, LC-MS runs and MS data ana- 
lysis were performed as described in previous sections. 


42. Tabb, D.L., McDonald, W. H. & Yates, J. R. Ill. DTASelect and Contrast: tools for 

assembling and comparing protein identifications from shotgun proteomics. J. 

Proteome Res. 1, 21-26 (2002). 

43. Elias, J. E. & Gygi, S. P. Target-decoy search strategy for increased confidence in 

large-scale protein identifications by mass spectrometry. Nature Methods 4, 

207-214 (2007). 

44. Collins, S. R. et al. Toward a comprehensive atlas of the physical interactome of 

Saccharomyces cerevisiae. Mol. Cell. Proteomics 6, 439-450 (2007). 

45. Pedrioli, P. G. A. et al. A common open representation of mass spectrometry data 

and its application to proteomics research. Nature Biotechnol. 22, 1459-1466 

(2004). 

46. Park,S.K., Venable, J. D., Xu, T.& Yates, J. R.A quantitative analysis software tool for 

mass spectrometry-based proteomics. Nature Methods 5, 319-322 (2008). 

47. Vallee, B. L. & Hoch, F. L. Zinc, a component of yeast alcohol dehydrogenase. Proc. 

Natl Acad. Sci. USA 41, 327-338 (1955). 

48. Zanghellini, A. et a/. New algorithms and an in silico benchmark for computational 

enzyme design. Protein Sci. 15, 2785-2794 (2006). 

49. Ma,S., Devi-Kesavan, L. S. & Gao, J. Molecular dynamics simulations of the catalytic 
pathway of a cysteine protease: a combined QM/MM study of human cathepsin K. 
J. Am. Chem. Soc. 129, 13633-13645 (2007). 

50. Jiang, L. et al. De novo computational design of retro-aldol enzymes. Science 319, 

1387-1391 (2008). 


©2010 Macmillan Publishers Limited. All rights reserved 


Psd ls 


doi:10.1038/nature09601 


A lower limit of Az > 0.06 for the duration of the 


reionization epoch 


Judd D. Bowman!* & Alan E. E. Rogers”* 


Observations of the 21-centimetre line of atomic hydrogen in the 
early Universe directly probe the history of the reionization of the 
gas between galaxies’. The observations are challenging, though, 
because of the low expected signal strength (~10 mK), and contam- 
ination by strong (>100 K) foreground synchrotron emission in 
the Milky Way and extragalactic continuum sources’. If reioniza- 
tion happened rapidly, there should be a characteristic signature” * 
visible against the smooth foreground in an all-sky spectrum. Here 
we report an all-sky spectrum between 100 and 200 MHz, corres- 
ponding to the redshift range 6 < z< 13 for the 21-centimetre line. 
The data exclude a rapid reionization timescale of Az < 0.06 at the 
95% confidence level. 

The observable differential brightness temperature caused by the 
redshifted 21-cm line from a volume of hydrogen gas in the intergalactic 
medium can be calculated from basic principles® and is: 


1/2 
572, (0, Zz) = 27 (1+) XHI (.- 7) (4) mK (1) 
Ts 10 


where 0 is the position on the sky, zis the redshift of the gas, the factor of 
27mK comes from cosmological factors, 6 is the local matter over- 
density of the gas, x};, is the neutral fraction of the gas, Ts is the ‘spin’ 
temperature that describes the relative population of the ground and 
excited states of the hyperfine transition, and T, is the temperature of 
the cosmic microwave background (CMB) radiation. The intensity of 
the 21-cm emission or absorption relative to the CMB has a strong 
dependence on the neutral fraction and the spin temperature, both of 
which are sensitive to the ultraviolet and X-ray radiation** from the 
formation of luminous sources, including early stars, galaxies and black 
holes. 

During the reionization epoch, after the heating of the intergalactic 
medium, the spin temperature is expected to be much larger than the 
CMB temperature® (Ts >> T.,) and the 21-cm perturbations will be seen 
in emission against the CMB and dominated by variations in the 
neutral fraction. Under the additional assumption that the local neutral 
fraction of the gas is not correlated to the local matter overdensity, the 
angle-averaged form of equation (1) can be reduced to: 


1/2 
(6T>(9,Z))9 = OT (z)~27 (2) Xp 1(Z) mK (2) 


where we have explicitly written the redshift dependence of the mean 
neutral fraction as Xy ;(z). This ‘global’ 21-cm signal should be observable 
in a measurement of the all-sky low-frequency radio spectrum through 
the mapping from redshift to frequency for the 21-cm line, according to: 
v = 1,420/(1 + z) MHz. Here, 1,420 MHzis the rest-frame frequency of 
the 21-cm line and the redshift range appropriate for the reionization 
epoch is z>6. Fixing the overall amplitude factor in equation (2) 
through the choice of a particular cosmological model, for example, 
the WMAP7 best-fit ACDM model’*, the global 21-cm signal becomes 
a direct probe of the evolution of the mean neutral fraction of hydrogen 
gas in the intergalactic medium during the reionization epoch. 


The global 21-cm signal is challenging to observe in practice because 
the low-frequency radio sky is dominated by intense synchrotron emis- 
sion from our own Galaxy that is more than four orders of magnitude 
brighter than the signal. Galactic and extragalactic free-free emission 
provide additional foregrounds’, as do numerous radio point sources 
from active galactic nuclei, radio galaxies and local Galactic objects. 
Radio-frequency interference from television, FM (frequency modulated) 
radio, low-Earth-orbit satellites, and other telecommunications trans- 
mitters are prolific and can be eight or ten orders of magnitude brighter 
than the astrophysical signal, even in geographically remote areas. 

We deployed a custom-built, high-dynamic-range broadband radio 
spectrometer, called EDGES*"®, at the Murchison Radio-astronomy 
Observatory in Western Australia, to measure the radio spectrum 
between 100 and 200 MHz. The instrument observed continuously for 
three months with low duty cycle and yielded the spectrum shown in 
Fig. 1—an average over nearly the entire southern celestial hemisphere. 
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Figure 1 | Measured spectrum between 100 and 195 MHz. The spectrum 
corresponds to redshifts 13 > z > 6. The grey spikes are spectral channels that 
experienced radio-frequency interference during the integration and are 
masked from the analysis. The shape and amplitude of the spectrum are 
dominated by Galactic synchrotron emission and modulated by the 
uncalibrated antenna bandpass, which causes the spectrum to roll off from the 
characteristic Tp ocv 7° power-law form of the foregrounds at low and high 
frequencies. Any global 21-cm contribution in the spectrum is at the 20-30 mK 
level, approximately four orders of magnitude below the visible foreground 
emission. Thermal noise in the spectrum is 6 mK at 150 MHz using 1-MHz 
binned spectral resolution. The thermal noise increases at lower frequencies 
owing to the larger sky noise and lowered transmission efficiency of the 
antenna. Any 20-MHz sub-band in this spectrum can be fitted by a fifth-order 
polynomial, leaving residuals at or below the thermal noise level. 
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To overcome the foreground signal, we relied on the expectation that all 
of the foregrounds have smooth continuum spectra that are well modelled 
as simple power laws in frequency~" (or redshift), and that any uncer- 
tainties in the calibration of the spectrometer could introduce only 
smooth spectral deviations into the measurement. We designed the 
instrument to accomplish this goal by shortening the electrical path 
length between the antenna and the internal calibration source to less 
than a wavelength and by using an electrically compact dipole antenna. 
Additional details on the design of the instrument are presented in the 
Supplementary Information. 

The observed spectrum was fitted by a model that consists of a 21-cm 
signal term and a polynomial term that accounts for the foregrounds 
and calibration uncertainties according to: Tobs(Z) = 6T21(z) + Te(z), 


where Tp(z) = x a,z" is the foreground term. The 21-cm term is 


n=0 
given by equation (2) with 


Xu 1(Z) = ; [tanh (=) - 1| 


following the recent convention of the WMAP7 analysis. The free 


parameters in the 21-cm model are the reionization redshift z. when 
the transition reaches 50% and the duration of reionization 
Az =(dxy1/dz)~! = _ 95° Lhe model is sufficient to account for all 


of the visible features in the observed spectrum after spectral channels 
with radio-frequency interference have been masked from the data set. 

We fitted the model to all available trials of z, in the observed spec- 
trum using an approximately 20-MHz subset of the spectrum centred 
on v, = 1,420/(1 + z,) MHz. In practice, we found that the order of 
polynomial that yields the best results is dependent on the trial redshift 
because the magnitude of the systematic structure in the spectrum 
varies with frequency. We used m = 4 for z, <9 and m= 5 for z, > 9. 
By assuming that reionization was equally likely to have occurred at any 
redshift between 6 < z,< 13 and treating each frequency trial as an 
independent measurement, the observations exclude reionization 
histories shorter than Az<0.19 at 68% statistical confidence. 
Systematic uncertainty is estimated through inspection of the distri- 
bution function of best-fit derivatives, Az, from all frequency trials. 
Under a null hypothesis, we expect the distribution to peak at Az_' = 0 
with large deviations indicative of systematic errors. We set the 
systematic error as the 68th percentile of the derivative distribution, 
corresponding to Az,,, = 0.21. Combining statistical and systematic 
uncertainties in derivative space in quadrature yields our final con- 
fidence bounds of Azg<0.13 at 68% combined confidence and 
AzZo5 < 0.06 at 95% confidence. The excluded duration bounds as a 
function of reionization redshift z, are plotted in Fig. 2. These con- 
straints are sufficient to rule out the most rapid plausible reionization 
histories, although more general theoretical expectations’ currently 
yield predictions of 1<Az<10. Our result extends findings by 
WMAP, which ruled out at a fiducial instantaneous transition at 
z<7 and yielded a best-fit z, = 10.5 + 1.2. 

The method demonstrated here is the only mechanism available at 
present to probe directly the derivative of the neutral fraction with 
redshift in the early Universe, and hence, to offer a unique way to 
constrain the reionization history. CMB anisotropy measurements 
probe reionization indirectly through an integral constraint on the 
optical depth to Thomson scattering of CMB photons and large-scale 
features in the E-mode polarization by free electrons in the intergalactic 
medium after reionization, whereas high-redshift quasar absorption 
spectra and Lyman-o galaxy surveys provide only a snapshot of the 
neutral fraction at the end of reionization. 

Future enhancements to the EDGES instrument are forecast’’ to 
improve the constraints presented here by an order of magnitude and 
should be particularly valuable in combined analysis'* with CMB and 
Lyman- quasar spectra. The current measurement also serves as a 
pathfinder for extending the techniques to higher redshifts (lower 
frequencies) of z~ 20 to search for absorption signatures*’””* in the 
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Figure 2 | Lower confidence bounds on the duration of the reionization 
transition. Statistical and systematic uncertainties are included. Grey 
indicates the 68% confidence bound and black the 95% bound. The white 
region is allowed by the data. The data rule out rapid reionization histories 
shorter than Az ~ 0.1 for many redshifts between 6 < z< 13. The two large 
gaps at redshifts z ~ 9.5 (138 MHz) and z ~ 10 (130 MHz) are at frequencies 
that require extensive radio-frequency interference excision because they fall 
into satellite and aircraft communication bands, respectively. 


global 21-cm signal that should be more distinct than the reionization 
transition, but may be masked by larger foreground contributions. Such 
high-redshift global 21-cm observations may eventually provide unpar- 
alleled information about the ultraviolet emission and radiative feed- 
back from the very first stars through the Wouthuysen-Field-effect- 
induced coupling of the kinetic and spin temperatures of the hydrogen 
gas in the intergalactic medium during the epoch of first light. 
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Snapshots of cooperative atomic motions in the 
optical suppression of charge density waves 


Maximilian Eichberger'*, Hanjo Schafer'*, Marina Krumova’, Markus Beyer’, Jure Demsar'*, Helmuth Berger’*, 
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Macroscopic quantum phenomena such as high-temperature 
superconductivity, colossal magnetoresistance, ferrimagnetism 
and ferromagnetism arise from a delicate balance of different 
interactions among electrons, phonons and spins on the nano- 
scale’. The study of the interplay among these various degrees of 
freedom in strongly coupled electron-lattice systems is thus crucial 
to their understanding and for optimizing their properties. 
Charge-density-wave (CDW) materials’, with their inherent 
modulation of the electron density and associated periodic lattice 
distortion, represent ideal model systems for the study of such 
highly cooperative phenomena. With femtosecond time-resolved 
techniques, it is possible to observe these interactions directly by 
abruptly perturbing the electronic distribution while keeping track 
of energy relaxation pathways and coupling strengths among the 
different subsystems*’. Numerous time-resolved experiments 
have been performed on CDWs*, probing the dynamics of the 
electronic subsystem. However, the dynamics of the periodic lattice 
distortion have been only indirectly inferred’*. Here we provide 
direct atomic-level information on the structural dynamics by 
using femtosecond electron diffraction’ to study the quasi two- 
dimensional CDW system 1T-TaS3. Effectively, we have directly 
observed the atomic motions that result from the optically induced 
change in the electronic spatial distribution. The periodic lattice 
distortion, which has an amplitude of ~0.1 A, is suppressed by 
about 20% on a timescale (~250 femtoseconds) comparable to half 
the period of the corresponding collective mode. These highly 
cooperative, electronically driven atomic motions are accompan- 
ied by a rapid electron-phonon energy transfer (~350 femtose- 
conds) and are followed by fast recovery of the CDW (~4 
picoseconds). The degree of cooperativity in the observed struc- 
tural dynamics is remarkable and illustrates the importance of 
obtaining atomic-level perspectives of the processes directing the 
physics of strongly correlated systems. 

1T-TaS, is one of the most-studied quasi-two-dimensional CDW 
systems’*””, It has a simple crystalline structure, consisting of planes of 
hexagonally arranged tantalum (Ta) atoms, sandwiched by two sul- 
phur (S) layers coordinating the central Ta atom in an octahedral 
arrangement'*””. In the low-temperature CDW phase, the conduction 
electron density becomes modulated, modifying the forces among the 
ions and generating a periodic lattice distortion (PLD) with a peri- 
odicity of ~12 A. This effect is illustrated in Fig. 1a, b together with 
the corresponding potential energy surfaces, U(Q), where Q is the 
generalized coordinate of the atomic displacements. The correspond- 
ing changes in U(Q) result ina shift of the equilibrium atomic positions 
to introduce a PLD. In 1T-TaS,, the transition from its metallic, 
unmodulated, phase to an incommensurate CDW phase (ICP) happens 
at 550K. At 350K, a transition toa nearly commensurate CDW phase 
(NCCP) occurs, where the amplitude of the PLD increases abruptly 


from 0.03 to 0.1 A and the CDW wavevector undergoes a sudden 
angular rotation from ¢=0° to ~12.3° with respect to the fun- 
damental lattice vector of the host (unreconstructed) lattice. Finally, 
a transition to a commensurate CDW phase (CCP) takes place at 
180K, with ¢ = 13.9° and a V/13 x 13 periodicity’. This phase 
transition is characterized by the appearance of the gap throughout 
the Fermi surface, and is argued to be due to Mott localization”. The 
appearance of the ICP can be described by the standard Peierls model’. 
According to this, in low-dimensional systems the divergence in the 
static electronic susceptibility at the wavevector 2k;, connecting par- 
allel Fermi surfaces at k; and —k; (where kg is the Fermi wavevector), 
gives rise to an instability of conduction electrons against the formation 
of the electron density modulation. Indeed, the comparison of the 
topology of the Fermi surface with parallel sections that can be con- 
nected by the modulation wavevector favours the standard Peierls 
model for the emergence of the ICP'*'*. The nature of the CCP and 
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Figure 1 FED data in the NCCP in 1T-TaS3. a, b, Schematic real-space 
images of Ta atoms and the electron density in the metallic (a) and the CDW 
(b) states together with the corresponding potential energy, U, as a function of 
generalized coordinate, Q. c, The diffraction pattern of 1T-TaS, at 200K 
(intensity is shown on a logarithmic scale in arbitrary units). Each Bragg 
reflection is surrounded by six first-order CDW reflections at the scattering 
wave-vectors q;, which each has an out-of-plane component of +1/3c* (red and 
blue circles, respectively, in inset). The projections of the q; on the basal plane, 
with a modulus of ~0.28a*, are tilted away from the closest fundamental lattice 
vector by an angle ¢ ~ 12.3°. d, Magnified view of the diffraction intensity (J) 
near the (210) Bragg peak (see box in c; for presentation purposes the 
diffraction image was symmetrized with respect to the six-fold axis). The 
secondary CDW reflections at the wavevector corresponding to the difference 
of the wavevectors of the first-order CDW peaks” are clearly resolved. 
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the NCCP, as well as of the ICP-NCCP and NCCP-CCP transitions, is, 
however, still under debate’*. Recently, 1T-TaS, received additional 
attention owing to the observation there of superconductivity below 
5 K under high pressure’. 

In this study, we investigated the structural dynamics of the PLD in 
30-nm-thick, free-standing slices of 1T-TaS,. We performed femto- 
second electron diffraction (FED) experiments in transmission geo- 
metry along the c axis, that is, perpendicular to the TaS, layers. The 
films were photoexcited with 140-fs optical pulses, and 50-keV elec- 
trons, in bursts of =250 fs, were used to monitor the structural changes 
by recording time-delayed diffraction patterns. The diffraction pattern 
of the NCCP (200K) recorded in this set-up is shown in Fig. 1c 
together with the assignment of some of the scattering vectors. The 
intense peaks are the Bragg reflections of the host lattice. Each of the 
Bragg peaks is surrounded by six weak satellite peaks originating from 
the PLD, with modulation wavevectors q; (ref. 21), illustrated in Fig. 1c 
(inset) and Fig. 1d. 

Figure 2a—-e shows the time evolution of the relative change of the 
diffraction signal in the vicinity of a Bragg peak, following photoexci- 
tation. The corresponding traces of the relative changes in the Bragg 
peaks (Alprage/Ipragg), the inelastic background (Albcxg/Ibckg) and the 
CDW peaks (Alcpw/Icpw) are shown in Fig. 2f (see also Sup- 
plementary Fig. 3). The intensity of the CDW peaks (the satellites of 
the Bragg peaks), Icpw; is suppressed by ~30% on the timescale of 
hundreds of femtoseconds. The corresponding suppression of the PLD 
gives rise to more-efficient scattering into the Bragg reflections of the 
host lattice, manifested by an increase of the Bragg peak intensity, 
Tprages by ~15%. In the CDW state, the presence of the PLD suppresses 
Tpragg Similarly to the effect of thermally induced disorder; that is, the 
presence of PLD can be looked upon as an effective Debye-Waller 
effect. The decrease in Icpw and the accompanying increase in Ipragg 
thus illustrate a cooperative phenomenon in which the optically 
induced redistribution of electron density efficiently decreases the 
PLD amplitude. Because Icpw is proportional to the square of the 
atomic displacements”, the resulting suppression of Icpw; by ~30%, 
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corresponds to ~16% change in atomic displacements (~0.02 A). 
Following the initial increase, Ipragg is found to partially recover on 
the 350-fs timescale. This recovery is accompanied by an increase in 
the inelastic background intensity, I,..g—see the intensity changes in 
the area indicated by the circle in Fig. 2e for the frames between 300 
(Fig. 2c) and 5,800 fs (Fig. 2e). This process can be attributed to the 
generation of phonons with non-zero momentum (q # 0); hence, Ipragg 
is reduced owing to the conventional Debye-Waller effect, leading to 
an increase in the inelastic background. 

A noteworthy feature of the data shown in Fig. 2f, and elaborated on 
in Fig. 3a, is the apparent difference between the dynamics of Alprage/ 
Ipragg and Alcpw/Icpw. Although the maximum in Ipragg is reached at 
a time delay of ~300 fs (Fig. 3a, dashed vertical line), the minimum in 
Icpwis reached at a time delay of ~500 fs (Fig. 3a, solid vertical line), at 
which point Ipragg has decreased from its maximum. This difference 
can be naturally explained by considering the effects of both the sup- 
pression of the PLD and the increase in the qg ¥ 0 phonon population 
on the two diffraction intensities. For the case of Ip;agg, the first effect 
gives rise to its increase as the periodicity of the host lattice is enhanced, 
and the increase in the q #0 phonon population (the Debye-Waller 
effect) has the opposite effect. Indeed, from the fast recovery of Ipragg 
and the corresponding increase in Ipcxg it follows that the energy 
transfer from electrons to q #0 phonons in 1T-TaS, takes place on 
the timescale of a few hundred femtoseconds (te_p, ~ 350 fs). For 
Icpw, both the displacive excitation of highly correlated atomic 
motions and phonon-induced disorder contribute to its suppression, 
explaining the longer timescale on which the minimum of Icpw is 
reached. 

It is instructive to compare the structural dynamics data with those 
of the electronic subsystem. We have performed all-optical pump- 
probe measurements, where the dynamics are mainly sensitive to 
the changes in the electronic properties. The photoinduced reflectivity 
change (Fig. 2g) shows a rapid onset on the 100-fs timescale, followed 
by a fast recovery with a decay time of 150 fs and subsequent slower 
decay with a relaxation time of ~4 ps, which is nearly identical to the 


Al/| Figure 2 | Time evolution of the diffraction 
intensities following photoexcitation with a 
fluence of 2.4mJ cm”. a-e, Evolution of Bragg, 
CDW and inelastic background intensities 
illustrated as relative change in the diffraction 
pattern at several time delays following 
photoexcitation with a fluence of 2.4mJ cm * anda 
photon energy of 3.2 eV (see also Supplementary 
Fig. 4 and Supplementary Information). These 
images were obtained by averaging (area enclosed by 
the box in Fig. 1d) over all individual Bragg 
reflections to increase the signal-to-noise ratio. The 
circle in e represents the area over which the inelastic 
background intensity was monitored. 

f, Corresponding dynamics of AIprage/ Ipraggs Alcpw/ 
Icpw and Alpcrg/Ibexg With fits to the data (dashed 
lines) and the extracted timescales. The suppression 
of the PLD, that is, the CDW peak intensity at 
negative time delays, is due to an increase in the 
sample temperature caused by the photoexcitation 
pulse train (accumulative heating). The initial drop 
in Ibckg is an artefact, a result of the decrease in the 
diffraction intensity of the nearby CDW peaks, 
whose tails extend well into the region where the 
inelastic background was evaluated (e). g, Dynamics 
of the differential reflectivity change, AR/R, at 

1.55 eV (800 nm), recorded at the same initial 
temperature and the same excitation energy density, 
together with the fit (dashed line). The signal has 
been offset vertically for presentation purposes. The 
oscillatory response corresponds to the coherently 
excited amplitude mode at 2.3 THz and phonon 
mode at 2.1 THz (refs 9, 11, 21). Inset, fast Fourier 
transform (FFT) of the oscillatory component. 
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CDW recovery time observed in the FED studies. The short decay 
timescale is identical to the one obtained in the NCCP by time- and 
angle-resolved photoemission spectroscopy” (tr-ARPES) and can be 
attributed to the electron-phonon energy transfer. Because the electron- 
phonon scattering rate is strongly momentum dependent ( « 1/q), it is 
quite natural to observe shorter time constants in optics and tr- ARPES 
than in FED. In the former experiments the signal is dominated by the 
energy transfer to q ~ 0 phonons, whereas in the latter the behaviour of 
Tragg ANA Iyexg is governed by the population of large-q phonons. In the 
optical data, owing to their high signal-to-noise ratio, in addition to the 
electronic response a weak oscillatory signal is observed. The main mode 
observed at 2.3 THz is the totally symmetric amplitude mode”''” of the 
CDW, whose amplitude is apparently smaller than the noise level in the 
FED data. 

Despite the fact that a large amount of energy is transferred to 
phonons on the subpicosecond timescale, the system is not yet in 
thermal equilibrium 1 ps after photoexcitation. The recovery of the 
PLD amplitude is clearly observed in Ipragg and Icpw. This timescale 
is well decoupled from both subpicosecond timescales. By fitting the 
recovery of Icpw with an exponential decay, we obtain a CDW recovery 
time of Tree = 4 ps. As this timescale is much longer than the oscillation 
period of the amplitude mode, it is reasonable to assume that the 
electronic part of the order parameter follows the PLD on the 
aforementioned timescale. Here the process that governs the CDW 
recovery dynamics is the thermalization with the longer-wavelength 
acoustic phonons by means of anharmonic phonon decay. Indeed, the 
characteristic linewidths of the low-energy optical phonons” are about 
10cm_ |, corresponding to lifetimes of 3 ps. The two distinct relaxation 
timescales, one of the order of 100-fs and the other of several picose- 
conds, are commonly observed in optical experiments in CDWs*””. 
From the direct structural dynamics and optical data on 1T-TaS,, we 
can conclude that the longer timescale describes the recovery of the 
coupled electron-lattice order parameter and that the shorter timescale 
corresponds to the partial recovery of the electronic part alone”. 

To determine the time constant for the electronic suppression of the 
PLD, which leads to an increase in Ipragg, we analysed its dynamics. By 
fitting (Supplementary Fig. 5 and Supplementary Information) the 
Tpragg trace, taking into account the finite optical and electron pulse 
widths, we determined a timescale of t.ypp + 250 + 70 fs for the PLD 
suppression. 
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Figure 3 | Early time dynamics and emerging 
time evolution of the CDW state in 1T-TaS, on 
photoexcitation. a, Data were recorded with a 40-fs 
time step at two excitation fluences, and compared 
with the optical AR/R data. The maximum induced 
changes in the Bragg (dashed vertical line) and CDW 
(solid vertical line) peaks were achieved ~300 fs and, 
respectively, ~500 fs after photoexcitation. b, The 
evolution of the real-space structure of the Ta plane 
of 1T-TaS, following photoexcitation with an intense 
optical pulse (circles represent Ta atoms and blue 
shading represents the density of conduction 
electrons; the amplitudes are strongly exaggerated). 
Before photoexcitation (t~ —1 ps), the Ta atoms are 
periodically displaced from their pure 1T structure, 
forming a nearly commensurate CDW. Intense 
perturbation of the electronic system gives rise to 
smearing of the electron density modulation 

(t ~ 0.1 ps), driving the lattice towards the 
undistorted state (at t~ 0.3 ps, the hexagonal 
symmetry of the pure 1T phase is nearly recovered). 
In parallel, the energy is transferred from the 
electronic subsystem to phonons on the 300-fs 
timescale, resulting in recovery of the electron density 
modulation and thermal disordering of the lattice 
(t~ 1 ps). The CDW order is recovered at t ~ 4 ps, 
after which time the sample is thermalized at a 
somewhat higher temperature. 


S 
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Information complementary to the above findings comes from con- 
sidering the energy flow following photoexcitation. In the experiments 
with fluences F = 2-4 mJ cm *,no signature of the NCCP-ICP transi- 
tion is observed. Only at F = 4.8 mJ cm * is the photoinduced NCCP- 
ICP transition realized (Supplementary Fig. 6 and Supplementary 
Information), as demonstrated by a strong suppression of the CDW 
peak intensity and a rotation of the primary CDW wavevectors, q;, by 
¢ ~ 10°. Using the literature values of the optical constants and the 
overall specific heat (Supplementary Information), we obtained a tem- 
perature increase of ~180K at F= 4.8 mJcm ~. This implies that the 
energy needed to drive the phase transition is comparable to the energy 
required simply to heat the sample across the phase transition. The 
rapid energy transfer from the electronic system to phonons 
(Te-ph ~ 150-350 fs), which is competing with the electronically driven 
PLD suppression process (supp ~ 250 fs), and the fact that the elec- 
tronically excited symmetric amplitude mode does not map into the 
rotation of the CDW wavevector, suggest that the NCCP-ICP transi- 
tion can be driven only thermally. 

The direct structural information obtained with FED, supported by 
time-resolved optical and published tr-ARPES" data, enabled us to 
elucidate the dynamics of the coupled electron-lattice order parameter 
(Fig. 3b). Strong photoexcitation and subsequent electron-electron 
scattering creates a high density of electron-hole pairs within 
=100 fs, raising the effective electronic temperature to several thou- 
sand kelvin. The electronic modulation is thereby strongly sup- 
pressed, modifying the potential energy surface U(Q). The collapse 
of the double-well potential brings about highly cooperative atomic 
motions towards a new quasi-equilibrium. This coherent process is, 
however, accompanied by the rapid recovery of the double-well 
potential due to cooling of the electronic subsystem through the 
electron-phonon scattering, which also takes place on the subpico- 
second timescale*’. The resulting suppression of the PLD ampli- 
tude, by ~20% (0.02 A), happens within a time of tyupp ~ 250 + 70 fs, 
that is, about half the period of the amplitude mode, ~440 fs (refs 9, 
11). After a time delay of t ~ 300 fs, the periodicity of the underlying 
lattice has increased and the amplitude of the PLD has decreased. By 
t ~ 1 ps, the electronic modulation has been largely recovered and the 
electrons have transferred the energy to q # 0 phonons, randomizing 
the atomic motions. Finally, the coupled electron-lattice order para- 
meter recovers on the timescale of ~4 ps, when the excess energy is 
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redistributed by further thermalization with low-energy acoustic 
phonons. 

These results demonstrate the extreme robustness of the PLD in 1T- 
TaS, against electronic excitation triggered by a femtosecond optical 
pulse. By contrast, in the insulating CCP the gap is fully suppressed" at 
only one-tenth of the absorbed energy density used in our study. The 
large difference in the two energy densities presents a strong argument 
that the NCCP-CCP transition is indeed Mott driven'’”’. 

The present work illustrates the importance of directly observing 
atomic motions on timescales short enough to follow even the effect of 
non-equilibrium electronic distributions on strongly correlated lattice 
dynamics. In this respect, the introduction of table-top FED systems***° 
with sufficient brightness and time resolution**”® is opening new path- 
ways to the investigation of a myriad of cooperative systems in which 
electron-lattice correlations have an important role***°. In systems 
with reduced dimensionality, such as quasi-one-dimensional and 
quasi-two-dimensional systems, in which structural changes have a 
predominantly in-plane character, the use of FED may be particularly 
advantageous. Because information about structural dynamics over the 
entire two-dimensional Brillouin zone is obtained in a single experi- 
mental run by FED, it is easy to distinguish between different processes 
that give rise to changes in the diffraction intensities, as in the case of 
1T-TaS;. Moreover, with further instrumental improvements, for 
example an increase in the signal-to-noise ratio, FED could be used 
to find signatures of lattice modulations, which may be difficult to 
determine by means of static diffraction methods, much like modu- 
lation optical spectroscopy is used to determine the electronic band 
structure in solids. 


METHODS SUMMARY 


In the present study, we used electron bunches of =250-fs duration containing 
4,000 electrons, each of which had a kinetic energy of 50 keV. The electron beam 
(spot size, 150 um) was collimated by a magnetic lens to scatter from the sample 
and generate a diffraction pattern downstream. The diffraction patterns formed on 
a phosphor screen after being intensified by a multichannel plate, and were 
recorded using a charge-coupled-device camera. The background pressures were 
10 ’and 10 ’ mbar in the electron gun and the sample chamber sections, respect- 
ively. We made the measurements in transmission mode at a repetition rate of 
1 kHz. In this geometrical configuration, spatiotemporal mismatch and surface 
charging effects are negligible. The electron pulse duration was characterized using 
a recently developed electron/laser-pulse cross-correlation method based on pon- 
deromotive scattering and N-body simulations. Photoinduced structural changes 
were initiated by 387-nm, 140-fs pump pulses focused to a spot with a full-width at 
half-maximum of 350 jim. The overall instrumental response time was 240-290 fs. 
The sample temperature, of 200 K, was achieved by using a cold finger attached to 
a well-conducting sample holder made of oxygen-free copper. We measured the 
temperature in situ using a calibrated temperature sensor. The 30-nm-thick, sin- 
gle-crystalline 1T-TaS» slices, ~200 um X 200 um in size (Supplementary Figs 2 
and 3), were obtained by cleaving a thicker single crystal using an ultramicrotome. 
The slices were picked up from the water surface using a host copper mesh. All- 
optical measurements were performed in reflection geometry using 60-fs optical 
pulses (carrier wavelength, 800 nm) at a repetition rate of 100 kHz. FED and all- 
optical experiments were carried out with the same excitation energy density and 
under the same sample temperature conditions. 
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Ice-sheet acceleration driven by melt supply 


variability 


Christian Schoof! 


Increased ice velocities in Greenland’ are contributing signifi- 
cantly to eustatic sea level rise. Faster ice flow has been associated 
with ice-ocean interactions in water-terminating outlet glaciers” 
and with increased surface meltwater supply to the ice-sheet bed 
inland. Observed correlations between surface melt and ice accel- 
eration’ ° have raised the possibility of a positive feedback in which 
surface melting and accelerated dynamic thinning reinforce one 
another’, suggesting that overall warming could lead to accelerated 
mass loss. Here I show that it is not simply mean surface melt* but 
an increase in water input variability® that drives faster ice flow. 
Glacier sliding responds to melt indirectly through changes in 
basal water pressure”""’, with observations showing that water 
under glaciers drains through channels at low pressure or through 
interconnected cavities at high pressure'’**. Using a model that 
captures the dynamic switching’? between channel and cavity 
drainage modes, I show that channelization and glacier decelera- 
tion rather than acceleration occur above a critical rate of water 
flow. Higher rates of steady water supply can therefore suppress 
rather than enhance dynamic thinning”, indicating that the melt/ 
dynamic thinning feedback is not universally operational. Short- 
term increases in water input are, however, accommodated by the 
drainage system through temporary spikes in water pressure. It is 
these spikes that lead to ice acceleration, which is therefore driven 
by strong diurnal melt cycles*"* and an increase in rain and surface 
lake drainage events*’”"* rather than an increase in mean melt 
supply**. 

The effective pressure in the subglacial drainage system, defined as 
overburden minus basal water pressure, controls coupling between ice 
and bed: lower effective pressure weakens the ice-bed contact and 
permits faster sliding’"''. Effective pressure is controlled by subglacial 
drainage, which occurs through two principal types of conduit (Fig. 1): 
Rothlisberger channels’’”® are kept open by a balance between a 
widening of the channel by wall melting due to heat dissipation in 
the water flow, and a narrowing that results from the inward creeping 
motion of the surrounding ice. By contrast, cavities'’’"”? are formed 
where ice is forced upwards by horizontal sliding over protrusions on 
the glacier bed. This opens a gap in the lee of the protrusion, with gap 
size controlled by the opening rate due to sliding and by creep closure 
of the cavity roof. 

An increase in effective pressure leads to faster creep closure. In an 
equilibrium channel, this must be balanced by greater wall melt. 
Greater wall melt in turn requires higher discharge and, thus, a larger 
channel. Rothlisberger channels therefore increase in size with increas- 
ing effective pressure (decreasing water pressure). This causes water 
flow from smaller channels into larger ones, favouring the formation of 
an arterial network with few main channels at low water pressure’. 
Cavities differ from channels as their size is not controlled by wall melt 
and increases rather than decreases with water pressure. A reduction in 
effective pressure suppresses creep closure and allows larger cavities to 
form’. This favours macroporous behaviour” with spatially distributed 
drainage along the ice—bed interface and water discharge increasing with 
water pressure. The abundance of channels relative to cavities therefore 


determines whether water pressure is low or high in the steady state: 
channels can efficiently transport water at high effective pressure whereas 
cavities require low effective pressure to transport the same flux. Past 
models*”*, however, do not capture switches from cavities to channels in 
spatially extended drainage or the formation of an arterial network, and 
cannot predict the spatial configuration of the drainage system. 

Here I unify the description of cavities and channels and predict 
how spatially extended drainage systems can switch from cavities to 
channels and back. The basic physics of cavities and channels can be 
captured in a single equation for the cross-sectional area, S, of a sub- 
glacial conduit, which can be a channel or cavity (Supplementary 
Information and Fig. 1): 

. =¢(i QV +uyph 
where Q is the water discharge, Y is the hydraulic gradient along the 
conduit and N = p; — pw is the effective pressure in the conduit (ice 
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Figure 1 | Properties of a single conduit. a, b, Physics of channels (a) and 
cavities (b). c, Conduit opening rate, c,|QY + uph (dashed line), and closure 
rate, c,N’"S (solid line), plotted against S. d, Steady-state N versus Q in a conduit 
(equation (2)). Parameter values are given in Methods Summary. Each conduit 
can generally attain one of two equilibria (points of intersection given as circles 
in c). These can be identified as channel and cavity. The larger (channel) 
equilibrium is prone to instability”®: if perturbed to slightly larger size, the 
conduit will continue to grow (opening rate exceeds closing rate to the right of 
the intersection). In a network of conduits, this eventually leads to one channel 
growing at the expense of all other nearby ones. The cavity equilibrium, by 
contrast, is stable, and cavities of similar size can coexist. In the steady state, 
effective pressure increases with discharge in a channel (increasing N makes the 
closure curve steeper, moving the channel intersection in c to larger values of S), 
and decreases with discharge in a cavity. A conduit becomes a channel above a 
critical discharge, Q. (dashed curve in d), and remains a cavity below Q.. 
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overburden, p;, minus water pressure, Pw). Q is related to S and V 
through the Darcy-Weisbach law’, Q=c;S"| aie YW where 
a= 5/4 and c; is related to the Darcy-Weisbach friction factor. The 
first term in equation (1) is the rate of conduit opening due to wall 
melting, the second is the rate of opening due to sliding of ice at speed 


0 5 10 15 


20 N(MPa) 
x (km) 
c 
e 
24 
2 a’ 
8 2+ aa“ 
a a4 
: wasaaaaset 
5 9 OSSOESEEEEESOSSEEEOE 
(Ss) 
d 
2.2 T 
[yJ 
a (M ins Nin) 
= om (ma No pannaaeeee 
§ 2/Pee, aaapaanass 
E ee QOOP 
= ©€0600000009000000000007 


0 
10 10 


m (cm d-1) 


Figure 2 | Steady-state drainage systems. a, b, Example of a drainage system 
formed spontaneously through the channelizing instability. a, Conduit sizes. 
Channels are much larger (dark blue and purple) than the surrounding cavities. 
b, Channels are shown in blue and effective pressure contours are shown at 
0.05-MPa intervals. The pressure distribution reveals how channel-cavity 
interactions control the drainage pattern. Channels are at higher effective 
pressure than the surrounding cavities. Local water pressure maxima (minima 
of N) separate the channels, driving water flow towards them. c, d, Steady-state 
drainage system characteristics as functions of water supply rate, m. c, Channel 
density (average number of channels per unit width of the domain) plotted 
against m. d, Mean of N over the domain plotted against m. Red triangles 
correspond to channelized systems; blue circles correspond to unchannelized 
ones. Open circles show unstable unchannelized systems (which will evolve 
into a channelized state if perturbed). Instability first occurs at a critical water 
supply, m., corresponding to a critical discharge, Q.. Mean effective pressure 
decreases with water supply (and, hence, discharge) for stable unchannelized 
systems, and increases with water supply for channelized ones. For some 
intermediate values of m (between m, and a lower limit, m,,, that corresponds 
to a critical lower discharge, Qm), both channelized and unchannelized states 
are possible: their low water pressure allows channels to suck in enough water to 
keep themselves open, but the discharge through the system is too low for an 
unchannelized system to channelize spontaneously. A video animation is 
included in Supplementary Information. 
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Up Over bed protrusions of size h and the third is conduit roof closure 
due to viscous creep; c), Cc) and n are constants related to the latent heat 
of fusion and ice viscosity. 

In the steady state, the effective pressure and discharge in a conduit 
are then related through (Fig. 1d) 


(QV +uph 


N= 
cacy /*QUays— 1/20) 


(2) 


At low discharge, Q, the effective pressure, N, decreases with Q, as is 
expected for cavities, whereas at higher discharge, N increases with Q 
and the conduit behaves as a R6thlisberger channel. The switch-over in 
behaviour occurs at a critical discharge 


uph 
ci (a— 1) vy 


= 
Below Q., the conduit is kept open mainly by ice flow over bed pro- 
trusions; above Q,, it is kept open by wall melting. 

A linear stability analysis (Supplementary Information) also shows 
that discharge becomes concentrated into a few conduits when the 
mean water discharge through an array of laterally connected conduits 
exceeds Q:: driven by wall melting, a single conduit will grow into a 
large channel (with the properties ofa R6thlisberger channel, its size, S, 
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Figure 3 | Idealized seasonal evolution of the drainage system. a, The spatial 
mean of effective pressure, N (red lines), plotted against time. The simulations 
shown are forced by a sharp increase (over 1 d) in water supply, m (black line), 
from a wintertime value of 0.33 cmd‘ to a summertime value of 10 cmd"! 
(solid lines) and 20 cm d7' (dashed lines). This is followed by steady supply for 
100d and a gradual return to 0.33cmd_'. The dots marked b-e correspond to 
the spatial drainage configurations shown in panels b-e, respectively. b-e, The 
drainage system starts close to an unchannelized steady state with small 
conduits (b). The abrupt increase in m leads toa sharp drop in effective pressure 
(a ‘spring event”), which opens the drainage conduits to accommodate the 
additional discharge but does not immediately channelize the system 

(c). Efficient channelization causes effective pressure to increase only after some 
time (d), reaching values above those of wintertime. The final drop in m causes 
a temporary jump in effective pressure that leads the system to shut down for 
winter (e). Both simulations in panel a show qualitatively the same response. 
However, the larger jump in water supply (dashed lines in a) leads to a shorter 
and less pronounced period of low effective pressure than the smaller jump 
(solid lines in a). A video animation is included in Supplementary Information. 
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Figure 4 | Temporal variations in water input. a, The mean of N (red) over 
the domain plotted against time for two different simulations. Simulations are 
started from a steady-state channelized system and forced with time-dependent 
but spatially uniform water input, m (black), imposing fivefold (solid lines) and 
tenfold (dashed lines) increases in m over 4d. b, The spatial mean of conduit 
size, S, plotted against time. During the initial increase in water input, conduits 
have not yet been able to widen to accommodate increased discharge. To force 
the additional discharge instead requires a temporary spike in hydraulic 
gradient, Y, leading to higher water pressure (lower N) upstream of the margin 
(red lines). This temporary drop in N is stronger for bigger jumps in m (dashed 
lines in a): Nmean Can even drop to zero, which corresponds to complete 
decoupling between ice and bed. Hydrofracture should occur”, although this is 
not included in my model. After the initial transient, conduit size adjusts and 
effective pressure increases again, reaching a maximum when m decreases 
again. c, Modelled sliding velocity, ugiae, normalized by steady-state sliding 
velocity, up. Time series of Ugige/Up are shown corresponding to the solid curves 
in a and b. Sliding is modelled using the empirical relation”? t) = cult N ; 
where ty is driving stress in the ice and C and p are constant parameters 
(Supplementary Information). The curves correspond to different values of the 
sliding-law nonlinearity, p, as indicated. In all cases, the initial drop in N leads to 
fast sliding. Recent developments'*”’ in glacier sliding suggest large values for p, 
for which the magnitude of sliding events is more pronounced. The calculation 
for Ugige however, excludes the effects of stress transfer to other parts of the 
glacier, which would prevent excessively large sliding velocities. 


and effective pressure, N, increasing with discharge, Q) at the expense 
of nearby ones, which shrink to form smaller cavities. Below this 
critical mean discharge, all conduits can be stable at the same size 
and behave as cavities (in which the steady-state effective pressure 
decreases with increasing discharge). 

The nonlinear dynamics of channelization can be captured by con- 
sidering a network’® of conduits described by equation (1) (Methods 
and Supplementary Information). With mean discharge below a critical 
value, Q., an initially nearly uniform network remains uniform as pre- 
dicted by linear stability analysis. For a mean discharge level greater 
than Q., the channelizing instability occurs and the system sponta- 
neously evolves a set of large, well-defined channels fed by smaller ones 
that are separated in turn by cavities (Fig. 2). This effect is similar to melt 
channelization in magmatic systems’. The spacing between the channels 
is controlled by lateral effective pressure gradients and decreases with 
increasing water input. An important feature of the nonlinear system is 
that channelization is irreversible. Even if the mean discharge is dropped 
back below Q., the previously formed channels do not necessarily dis- 
appear: this requires discharge to drop below a lower critical level, Q,, 
(Fig. 2). 
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An increase in steady meltwater supply lowers the effective pressure 
and therefore speeds up sliding’""' only below the critical discharge, Q., 
for channelization (equation (2); Figs 1d and 2d). Once this is exceeded, 
the effective pressure increases again. Channelization increases the 
effective pressure further: concentrated discharge leads to faster channel 
wall melting that must be offset by stronger creep closure, driven by 
increased N. An increase in steady meltwater input therefore has limited 
potential to cause glacier acceleration and will eventually even lead to 
glacier deceleration. 

This result, however, applies only to steady conditions. Observations 
in Greenland*® indicate that seasonal and short-term water supply var- 
iations can lead to transient acceleration. Ice velocities in some areas 
are consistently above their wintertime average early in the melt sea- 
son, but slow down to below their wintertime average later in summer. 
This can be explained by a seasonal switch from unchannelized to 
channelized drainage, in which a combination of increased water sup- 
ply and incomplete channelization cause low effective pressures in 
early summer (Fig. 3). However, Fig. 3 also shows that channelization 
occurs faster and that the decrease in effective pressures early in the 
melt season is smaller when summertime water supply rates are large. 
Higher summer surface melt rates are therefore likely to suppress the 
magnitude and duration of the period of higher velocities in early 
summer. 

Short-term spikes in water supply can also induce spikes in water 
pressure, and lead to the observed® short-term (< 1-day) fast-sliding 
episodes even when the drainage system has channelized’? (Fig. 4). 
This happens because the size of conduits adjusts slowly (over several 
days), and the drainage system does not have the capacity to accom- 
modate sudden extra water throughput except by an increase in the 
hydraulic gradient, Y. This increase in Y requires higher water pres- 
sures in the interior of the drainage system, leading to lower effective 
pressures and, hence, to faster sliding. Not only can short-term vari- 
ability lead to acceleration even after channelization, but the mag- 
nitude of water pressure excursions during short-term water supply 
spikes can also be much larger than the slower seasonal water pressure 
signal (compare Figs 3a and 4a). 

Ice velocity can therefore respond much more to short-term tem- 
poral variations in water supply than to changes in mean water flow. 
This has major implications for ice-sheet dynamics and feedbacks 
between surface melting and dynamic thinning’. More surface water 
input through melt or rain is likely if dynamic thinning draws down 
the ice surface. This can lead to increased ice flow and further thinning 
if the basal water supply is initially very low or if the bed is frozen. 
However, larger rates of summer water supply can also cause faster 
channelization and potential ice deceleration. Further acceleration 
must then be driven instead by short-term temporal variability in 
water supply. This is favoured by strong diurnal cycles’ and frequent 
rain events'’, both of which are more likely at lower latitudes, or if the 
ice sheet develops numerous surface lakes that drain abruptly”®. 

Drainage channelization under glaciers and ice sheets suppresses 
the ability of steady surface water supply to cause further ice accel- 
eration, but faster ice flow can be caused instead by water input 
variations’. This is already observable in Greenland and will become 
more important when the climate changes: diurnal melt cycles 
already contribute to ice flow in southern Greenland’, and more 
frequent rain events are predicted to result from a northward shift 
of storm tracks over the next century”®, which will cause further 
ice acceleration. My results are also relevant to palaeo-ice-sheet 
dynamics. Simulations that do not include subglacial processes can- 
not explain the observed rapid collapse of the Laurentide ice sheet”. 
A water input/dynamic thinning feedback is a plausible collapse 
mechanism, driven by rain and diurnal melt cycles rather than by 
mean melt alone. Future coupled models are needed to fully explain 
the role of drainage in rapid deglaciation, and my results show that 
channelization and short-term drainage variability are the crucial 
processes that must be captured in these models. 
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METHODS SUMMARY 


I model a drainage network in which nodes i and j are connected by a conduit 
(network edge) labelled by subscripts i and j. The conduit evolves according to 
dsj A 
a =C) Qi Pi + uph — ON; Si 
where Sj, Qj, Yj and Nj are conduit size, flux, hydraulic gradient and effective 
pressure, respectively. Conduit sizes, Sip and effective pressures at the nodes, Nj, are 
the primary variables. I set ¥j= Pit (Nj —N;) /Lij, where L; is the distance 
between nodes and ¥ ji is a geometrically controlled background hydraulic gra- 
dient (Supplementary Information). Additionally, Nj =(N;+Nj)/2 and 
Qy = ¢38;"|%|/?W%;. At each node, mass is conserved. Ignoring water storage 
(Supplementary Information), mass conservation requires that 


Dj Qj= mi 


where the sum is over the nodes, j, connected to the node i, and m; is water input to 
node i. I use a rectangular lattice network oriented at 45° to downslope, with a 
domain size of 10 km X 20 kmand2 X 10* conduits. I impose N = 0 at the margin, 
zero inflow upstream and periodic sides. Water input is spatially uniform (all m; 
are the same) and is given as rate of volume input per unit area. The parameters are 
a=5/4, c,=34X10 °Pa', o=45X10 Pa °s |, c5=033kg 1? m>? 
and u,h=3m°’yr | (Supplementary Information). For illustrative purposes, 
Py is based on the shape of a plastic glacier with a yield stress of 10° Pa on a 3° 


slope. In Fig. 1 Y% = 512 Pam”! andu,h = 3m’ yr |, and in Fig. lc N = 2.85 MPa. 
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Outer-core compositional stratification from 
observed core wave speed profiles 


George Helffrich'+ & Satoshi Kaneshima” 


Light elements must be present in the nearly pure iron core of the 
Earth to match the remotely observed properties of the outer and 
inner cores’”. Crystallization of the inner core excludes light ele- 
ments from the solid, concentrating them in liquid near the inner- 
core boundary that potentially rises and collects at the top of the 
core’, and this may have a seismically observable signal. Here we 
present array-based observations of seismic waves sensitive to this 
part of the core whose wave speeds require there to be radial com- 
positional variation in the topmost 300 km of the outer core. The 
velocity profile significantly departs from that of compression of a 
homogeneous liquid. Total light-element enrichment is up to five 
weight per cent at the top of the core if modelled in the Fe-O-S 
system. The stratification suggests the existence of a subadiabatic 
temperature gradient at the top of the outer core. 

Many light elements, namely hydrogen, carbon, nitrogen, oxygen, 
sulphur and silicon, could plausibly have been included in the core 
during accretion of the early Earth”*. Among these, oxygen and sul- 
phur are of interest owing to oxygen’s potentially high solubility in iron 
at high pressures and the extremely low-temperature eutectic between 
iron and sulphur, which facilitates segregation into the core” ®. 
Independently of the particular light elements involved, their enrich- 
ment in the liquid of the outer core is potentially observable because it 
affects liquid density and bulk modulus, thereby changing seismic 
wave speeds. Indeed, a series of past seismic studies focused on this 
area specifically to seek evidence for stratification’”'*; many suggest 
lower seismic wave speeds in the outer core near the core-mantle 
boundary (CMB) than self-compression of a chemically homogeneous 
outer core would imply. 

Our study uses earthquakes in South America and in the south- 
western Pacific region that emit shear waves towards the core and 
convert to compressional waves that repeatedly reflect from the under- 
side of the CMB (Fig. 1). The arrivals, SmKS, reflect m — 1 times from 
the core side of the CMB. Thus most of their travel time is accrued 
across the area of potential light-element accumulation, between 80 
and 400 km below the CMB. We use stacked records of 120-190 indi- 
vidual observations of SmKS waveforms of three separate earthquakes 
recorded by large-scale seismic arrays in Japan and northern Europe to 
measure differential travel times and slownesses between SmKS and 
SnKS, with m > n, 2 = (mand n) = 5 (Fig. 1). The events are selected 
for their range (separating SmKS multiples with m = 4) and for their 
sampling of different surface and CMB environments. Our array- 
based measurements on single events yield stacked waveforms whose 
differential times and slownesses are the raw data used. Array mea- 
surements provide not only direct differential slowness information; 
more importantly, the stacked waveforms average over source- and 
receiver-side near-CMB path effects due to D’’ structure. 

In the model, named KHOCQ, derived from the observations, we 
find that outer-core wave speeds decrease to values a maximum of 0.3% 
lower than PREM wave speeds’* 60 km into the outer core and gradually 
return to PREM values at a depth of ~300 km (Fig. 2). These small, 
resolvable differences are significant departures from homogeneous 


self-compression of core material’. For two regions with very different 
seismic velocity structures in the lowermost mantle, we find a thicker 
anomalous layer (300 km) than any of the earlier studies, owing to the 
use of S4KS and S5KS, which is particularly suitable for investigating the 
outermost core but has not yet been extensively used. Isolation of pro- 
pagation delays accrued at shallowest core levels improves resolution of 
deeper structure. 

To interpret the observations, we model seismic wave speeds in Fe- 
O-S liquids under core conditions’* (Methods). Given composition, 
pressure and temperature, the model provides liquid bulk modulus, K, 
density, p, and, thus, seismic wave speed, (K/p)'’*. We calculate com- 
positional profiles at the top of the outer core that match our wave 
speed profile to within 0.02%, have average densities in the top 300 km 
of the outer core within 1% of PREM (and thereby obey normal-mode- 
derived density constraints’*), and that merge with PREM wave speeds 
300 km below the CMB. Feasible variations in iron, oxygen and sul- 
phur that lead to stable stratification are shown in Fig. 3. The modelled 
compositions agree with KHOCQ for oxygen enrichments towards the 
CMB of 0.8-2 wt% and iron depletions of up to 5 wt%; sulphur may be 
either enriched by up to 3 wt% or depleted by 0.8 wt%, depending on 
the oxygen content of the liquid. At light-element enrichment of 5% at 
the top of the core, the total added light-element mass in the region is 
about 11% of the inner core’s mass. We can equate the light-element 
mass expelled by inner-core crystallization with light-element enrich- 
ment in the layer by assuming that the density jump at the inner-core 
boundary, Ap;cp, multiplied by the inner core’s present volume gives 
this mass'*. Using a previous estimate’ of 610 + 180kgm_* for Apycz 
attributable to light-element release, we find the light-element mass in 
the layer agrees (at the 95% confidence level) with the mass expelled by 
the inner core if enrichment is 2.5-3 wt% at the top of the outer core. 
The range of seismically defined Apjcg estimates is quite wide, however, 
so the rough mass balance demonstrates the feasibility of the layer’s 
formation mechanism rather than providing strong compositional 
constraints. 

The lighter material at the top of the core is 5.9% less dense than the 
liquid 300 km below it (and 1.6% lighter than PREM densities on the 
core side of the CMB). The layer’s Brunt-Vaisala frequency, a measure 
of the strength of density stratification in the core’’, is 0.51-1.03 mHz 
(periods of 1.63-3.43 h), suggesting strong stabilization. The density 
excess in the layer, —0.8 x 10 * to —3.2 X 10 7, is about 100 times 
greater than that estimated” to explain short-period geomagnetic field 
secular variation and its possible correlation with length-of-day varia- 
tions. The layer’s density gradient exceeds that in the well-mixed outer 
core’® by a factor exceeding 10'°. Owing to the gradational difference 
in composition from the bulk of the outer core at the layer’s base, 
stabilizing it against convective mixing in the outer core is problematic 
(see Supplementary Information for a discussion). A feasible mech- 
anism is a subadiabatic gradient at the top of the outer core due to the 
inability of core heat to escape into the mantle on account of its slower 
convection speeds and low thermal diffusivity'*’’. The core liquid 
model (Methods) yields velocities that match our observed profiles 
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Figure 1 | Experiment geometry and paths taken by the seismic waves 
through the Earth. a, Sources (X) are subduction-zone earthquakes in South 
America (Argentina) and the southwest Pacific Ocean (Fiji) recorded by 
seismic arrays in Japan (~120 stations) and in Europe (~190 stations). The ray 
paths are superimposed on tomography maps (SB10L18) of shear-wave speed 
variations, dvg, at 2,770 km in the lowermost mantle*®. White stars indicate 
representative core entry points of S3KS. b, SmKS ray paths travel across the 
mantle as shear waves, but across the core as compressional waves. In the core, 
they reflect m — 1 times from the underside of the CMB. ¢, Record section of 
observed (left, recorded by European stations) and predicted (right) SmKS 
arrivals from the Fiji earthquake. Arrivals and synthetics are aligned on S3KS 
(0s), and the Preliminary Reference Earth Model'* (PREM) predicted 
successive SmKS arrivals shown by dashed lines. Reflectivity synthetics for 
PREM (right) show that S4KS and S5KS are delayed. $2KS, which arrives before 
S3KS, is not shown, for clarity. PcPPKiKS is the prominent low-slowness arrival 
in the synthetics at 163-170", not observable in the data. 


either for adiabatic temperature gradients in the topmost 300 km of the 
core or for an isothermal layer for temperatures between 3,000 and 
5,500 K, so the results are independent of the temperature structure 
chosen. Higher temperatures imply a bulk core composition more iron 
rich by 1-2 wt% and correspondingly larger light-element enrich- 
ments. This trade-off between core temperature and light-element 
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Figure 2 | Data, velocity profile in the outer core and self-compression 
profile using the velocity profile. a, Raw data are differential travel times (df) 
and slownesses measured relative to PREM” for two regions. Data (and 95% 
observational uncertainty) for the two Fiji events and one Argentina event 
studied are shown with squares, previous model predictions (see b) are shown 
with circles and KHOCQ predictions are shown with stars. Most models 
significantly overestimate differential travel times relative to observations. 

b, Outer-core velocity resulting from t-p inversion of differential SmKS travel 
times and slownesses (Methods). Velocities are shown relative to PREM’? and 
are compared with recent models incorporating similar SmKS data: [ASP91”, 
SP6**, AK135” and model 1 from ref. 11. Lines across the top indicate depths 
into the core travelled by each SmKS path. Higher multiples travel to smaller 
depths in the outer core. The KHOCQ error bounds correspond to the range of 
inversions with 2c travel-time uncertainties applied—roughly the 95% 
confidence level. c, Birch’s parameter (and uncertainty bounds) calculated from 
KHOCQ departs significantly from self-compression in the range 80-300 km 
below the CMB, requiring compositional change in the outer-core liquid. The 
self-compression line shows the theoretical behaviour of a substance whose 
composition does not vary (Methods). Birch’s parameter for PREM closely 
approximates that for self-compression. 
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Figure 3 | Compositional variation in the Fe-O-S liquids that match the 
observed wave speed profile and core density constraints. a, Feasible 
composition profiles are shown by lines linking PREM wave speeds at 300-km 
depth in the outer core to wave speeds at the top of the outer core. Profiles are 
isothermal in the topmost 300 km and then join an adiabat initiated at the CMB 
at 4,300 K. For feasibility, core liquid compositions must be within 1% of the 
average PREM density in the top 300 km of the core’’ and must match the 
observed velocity profile. For comparison with previous core composition 
estimates, the filled circle indicates a first-principles core liquid composition 
estimate”. b, Compositional variation along feasible profiles for 1% density 
agreement with PREM. The compositional difference at the CMB and the base 
of each feasible profile shown in a (presumably the well-mixed outer core 
represented by PREM wave speeds 300 km below the CMB) is shown here. 
Oxygen enrichment by >0.8 wt% is required in all cases and sulphur 
enrichment is required only at oxygen contents =1 wt%. 


concentration also adds uncertainty to the light-element balance 
between the inner core and the layer discussed earlier. 

The liquid model we use only approximates the true composition of 
the core because it includes only iron and accounts for only oxygen and 
sulphur as light elements. Nickel is also thought to be present in the 
core, in concentrations up to ~5 wt% (ref. 4). However, in these con- 
centrations it seems to act equivalently to iron in sulphur- and oxygen- 
bearing metallic liquids, in the sense that melt surface energies are 
unchanged when nickel replaces an equivalent amount of iron”. 
Furthermore, (Fe,Ni)-(Fe,Ni)S eutectic temperatures are only 20°C 
lower than Fe-FeS (ref. 21). Thus, except for the minor difference in 
density (<0.1%) due to the substitution of nickel for iron, we expect no 
significant effects on liquid properties. Present cosmochemical models 
for core composition suggest that silicon might also be present in 
concentrations of a few weight per cent*. At present, we lack the 
requisite melting data on Fe-Si alloys to develop an analogous model 
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for silicon-bearing melts. We suspect that sulphur might decrease in 
proportion to silicon’s increase”’ and lead to a slight decrease in density 
(<0.1%, due to the slight atomic weight difference), but the bulk 
modulus effect on wave speeds cannot yet be assessed owing to the 
lack of Fe-Si melting data. 

If stabilized subadiabatically against convection, core-mantle reac- 
tion might inject oxygen into the outer core*’”*. Oxygen diffusion 
from the CMB cannot be responsible for the complete depth range 
of the observed anomaly, however. Anion diffusivities in liquid iron 
are estimated to be 10 ''-10 °m*s ‘ (refs 3, 24), leading to diffusion 
length scales of 10-100 km over the lifetime of the Earth. Thus, the 
profile was in large part created by an upward buoyant flux of light 
elements from the crystallization of the inner core. The outer core’s 
velocity profile seems to record the secular evolution of the core’s 
composition as the subadiabatic layer grew, possibly modified by 
core—mantle reaction. We remark in passing that a subadiabatic layer 
at the top of the core also has profound implications for the thermal 
evolution of the core and CMB thermal structure. 


METHODS SUMMARY 


Data are broadband records from regional networks in Europe and Japan. The 
radial component waveforms are uncorrected for instrument response and corre- 
spond to ground velocity given typical instrument characteristics (Supplementary 
Information). We measured differential travel times and slownesses of S3KS- 
S2KS, S4KS-S3KS and S5KS-S3KS on stacked and Hilbert-transformed wave- 
forms (when required). These are the raw data for the inversion. 

We determined velocity profiles by numerical t-p inversion” using up to six basis 
functions, yielding flattened earth velocity perturbations, dv¢/v, of the form (dvi 
vp(r) = £0.01max"(0, r— r)/(romp — 1)*, where r is radius and n, and romp are 
the minimum perturbation radius and the CMB radius, respectively. See 
Supplementary Information for fitting details and coefficients. The Fe-O-S liquid 
model is from ref. 14 and is based on thermodynamic fits to the melting curves of Fe, 
FeO and FeS using a 1-bar metallurgical model of liquid free energy. See Methods for 
the fitting procedure and computational methods; liquid thermophysical properties 
are listed in Supplementary Information. The model yields high-pressure and 
-temperature bulk modulus and density, and, therefore, seismic wave speeds in 
outer-core liquid. Because the model is free-energy based, the liquid heat capacity, 
thermal expansivity and, thus, the adiabatic gradient in the liquid can also be 
calculated through thermodynamic identities relating to free-energy derivatives. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


SmKS analysis. We investigate the family of SmKS (m = 2, 3, 4 and 5) waves using 
differential travel times between SmKS waves with different values of m. These are 
sensitive to the velocity structure of the uppermost few hundred kilometres of the 
outer core (see Fig. 2b for the bottoming depths of SmKS). S4KS and S5KS sample 
more predominantly the shallowest 100 km of the core, and their differential travel 
times are insensitive to the velocity structure of the mantle. Robust observations of 
S4KS and S5KS have been hampered by their mutual interference, which we 
ameliorate by array analyses of SmKS at large distances, from 140 to 170°, where 
the arrivals separate. We analyse broadband seismograms of two deep earthquakes 
in Fiji-Tonga observed in Europe and an event in Argentina observed in Japan and 
Taiwan (Supplementary Table 1). The selected events have quite simple impulsive 
source time functions, such that their SmKS phases are separated clearly enough 
from each other up to m = 5, as seen in the record sections of the radial component 
of the broadband seismograms (Fig. 1c and Supplementary Fig. 2). Comparisons 
with the record sections drawn for reflectivity synthetic seismograms™ for the 
same event-network pairs (Fig. 1c and Supplementary Fig. 2) show systematic 
differences across the arrays in the SmKS differential travel times (dt) between the 
observations and PREM. In particular, S3KS (relative to S2KS) and S4KS and S5KS 
(both relative to S3KS) all arrive later than those computed for PREM and have 
about the same delay (~1 s). To quantify these observations accurately, differential 
travel times and slownesses are measured by array techniques using either SKKS 
(called S2KS hereafter, for clarity) or S3KS as the reference phase. Figure 2a shows 
differential travel times that are measured on the linearly stacked waveforms for 
the distances at the array centres and corresponding slownesses between three 
pairs of SmKS waves; S3KS relative to S2KS (called S3KS-S2KS, labelled dt*”), 
S4KS relative to S3KS (S4KS-S3KS, d¢**) and S5KS relative to S3KS (SSKS-S3KS, 
dt®*; see Supplementary Table 2 for data). 

SmKS touches an internal caustic and reflects at the underside of the CMB m — 1 

times, suffering a phase delay of nearly m(m — 1)/2 relative to SKS. Consequently, 
SmKS and S(m+1)KS are related by a Hilbert transform and SmKS and 
S(m + 2)KS have opposite polarities. To account for the phase shift, we Hilbert- 
transform linearly stacked S2KS. We then slant-stack the observed broadband 
seismograms aligned on S2KS (Supplementary Table 2), and measure the arrival 
time difference between the transformed S2KS and S3KS. For S4KS-S3KS, S3KS is 
used as the alignment phase. The stacked S3KS is Hilbert-transformed and the 
relative travel time of S4KS is measured on the slant-stacked waveforms. S5KS- 
S3KS is measured as the arrival time difference between S3KS and a pulse of the 
opposite polarity on the slant-stacked trace aligned on S3KS (Supplementary Figs 2 
and 3 and Supplementary Table 2). Synthetic tests on the reflectivity seismograms 
for several global Earth models show that these measurements give differential 
travel times between SmKS waves that agree with their ray theoretical predictions 
to within 0.2 s. Slight differences in instrument responses among the stations hardly 
affect the results of array processing according to a synthetic test, so stacking and dt 
measurements are performed on the broadband seismograms to avoid deconvolu- 
tion (Supplementary Fig. 3). S3KS—S2KS and S4KS-S3KS are measured by picking 
peaks and cross-correlating the stacked waveforms, whereas S5KS-S3KS is mea- 
sured by picking peaks only. The errors in those measurements are evaluated on the 
basis of the 95% confidence levels derived from stacking (Supplementary Fig. 3 and 
Supplementary Table 2). 
Inversion methodology. The measured differential times and differential slow- 
nesses (dp) relative to those computed from PREM are listed in Supplementary 
Table 2 and shown in Fig. 2a, and the vp anomaly as a function of depth is sought 
relative to PREM. The choice of PREM as the reference model is motivated by the 
many previous studies'®"'~? which showed that PREM agrees with the SmKS dif- 
ferential travel times better than any other global reference model, such as IASP91, 
AK135 and SP6. Although previous studies”""' suggested the presence ofa thin layer 
(50-100 km thick) with a vp 1-2% lower than PREM at the top of the outer core, a 
recent study’* shows that the deviations from PREM, if any, should be smaller. 

We use a t—p inversion method to invert the differential time and slowness 
measurements to determine the optimum velocity profile. A variety of basis func- 
tions were explored to eliminate parameterization bias. Fit details and coefficients 
are provided in Supplementary Table 3. 

As a result of the inversion, we find that vp values are slightly slower than 
predicted by PREM for the top 300km of the outer core. As seen in Fig. 2a, b, 
the new data of S5KS-S3KS and $4KS-S3KS do not favour a steep and monotonic 
reduction of vp as proposed in the previous studies, although a moderately lower vp 
(by ~0.3%) than predicted by PREM at the CMB is certainly required. Near the 
CMB, vp is consistent with the range of feasible solutions in ref. 12. According to 
synthetic tests, density perturbations up to about 1% from PREM do not affect the 
observed relative amplitudes of SmKS above the uncertainty level of the stacked 
seismograms, so we do not have a tight control over density, unlike that over vp. 
Our model is consistent with the measurements of S4KS-S3KS obtained for 
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individual seismograms’ and for a small-aperture array”, although the uncer- 
tainties in the previous measurements are far larger than ours. The variations in 
global compilations of S3KS-S2KS for individual station-earthquake pairs are 
much larger but are broadly consistent with our model, probably because they 
are prone to bias by mantle-side, small-scale velocity heterogeneity. 

The synthetic seismograms are computed by the reflectivity method, with the 

stacked broadband seismograms of S2KS used for the source time function after 
correction of phase shift. We emphasize that data for both Fiji events suggest 
similar vp anomalies and that the models (for example KHOCQ; Supplemen- 
tary Fig. 4) obtained for the Fiji events match the differential travel times for the 
Argentina event. The D’’ region sampled by the data from Fiji to Europe includes a 
root of the Pacific super plume, and that for Argentina to Japan is near a subducted 
plate (Fig. 1a and Supplementary Fig. 1). The synthetic stacked seismograms for 
the derived model (KHOCQ) show excellent fits to the observed stacked seismo- 
grams for both the Fiji and the Argentina events (Supplementary Fig. 3). The 
presence of a layer with a large velocity anomaly in D’’ including an ultralow- 
velocity zone changes S3KS-S2KS, S4KS-S3KS and S5KS-S3KS by about 0.2 s, but 
their relative magnitudes are hardly affected (less than 0.1 s). The minor difference 
in the models obtained for the two regions may be partly explained by the differ- 
ence in the vg structure in D’’. More extreme and smaller-scale anomalies might 
affect the differential travel times, but piercing points at the CMB of the SmKS rays 
are spread over 3,000 km and 2,000 km beneath the receiver side and source side, 
respectively. The differential travel times used, therefore, should reflect the mantle 
and core structure averaged over a wide area. All of these considerations indicate 
that lateral variation in the mantle-side structure has only minor effects on the 
model, and that there actually is a velocity anomaly in the outermost core. 
Core liquid model. The thermodynamic properties of dominantly iron liquids 
containing various impurities are well known owing to their commercial import- 
ance in steelmaking. One example is the Fe-O-S system. Experimental constraints 
are available on room-pressure thermophysical properties, leading to tabulations 
of their thermodynamic properties and models for mixing in the liquid state. For 
metallurgical and materials science reasons, the high-pressure properties of solids 
are also known through pressure-volume equation-of-state measurements. 

Liquid properties are not as well characterized at high pressure, however. A ther- 
modynamic approach is used to obtain the properties of Fe-O-S liquids from the 
melting curves of Fe, FeO and FeS, which have been determined experimentally”, 
and shock-wave measurements on FeO». At the melting point, the free energy, G, of 
the solid (s) and liquid (1) are equal. Referring to the standard states, G®, of the solid 
and liquid at reference temperature and pressure (T,, P,), and using the dependence 
of free energy on entropy, S, and volume, V, we find that 


P vB 


S(P,,t) dt 


G(P, T)=G°(P,, Tr) + | 


V(p, T) dp— | 
P, 
S(T) is available from standard tables at ambient pressure, P,, and V(P,, T) may be 
obtained from thermal expansivity measurements on solids at ambient pressure 


yielding «(T), through the relation 
T 


V(T) =V(T,) exp l| a(t) a 


Ty 
Solid volumetric properties are obtained by a third-order Birch-Murnaghan equa- 
tion of state. The implicit relation between P and Vis through a reference volume, Vo, 
from which the finite strain, f, undergone by compression is expressed as f= [(V/ 
Vo)? — 1]/2. The equation of state depends on isothermal bulk modulus, Ko, and its 
pressure derivative, K’: 


P(f) =3Kof (1+2f)°7[1—3f(K’—4)/4] (1) 


The required integral, |V dP, is usually evaluated by integrating equation (1) by 
parts”®, yielding 


P Vv 
| V dp=(V—V,)(P—P,) — | P(df /dv) dv 
P, Vr 

In equation (1), Ko is the bulk modulus at P, but at high T. The Anderson- 
Griineisen parameter, 67, is used to weaken the bulk modulus at high temperature, 
under the assumption that it is constant above the Debye temperature*’. Thus, 


T 
Ko(T) =Ko(T,) exp —or | o a . K’ is taken to be independent of temperature, 
T, 


and, if not otherwise known, K’ ~ 6 (ref. 38). 

To calculate G\(P, T), liquid Ko, K' = dy and « are needed. These are estimated 
from the melting curves, where G, = G,, using solid equation-of-state data to 
calculate G, and matching the experimental melting brackets. Grid search over 
Ko, aand K’ yields the values shown in Supplementary Table 4, and Supplementary 
Fig. 5 shows the results of the fits to the experimental brackets for the liquids. For 
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higher-pressure calculations, entropies of the solid are modified for known phase 
transitions shown in Supplementary Fig. 5. These lead to minor changes in the 
calculated melting-curve slope but it is unresolvable given the width of the melting 
brackets. For this reason, owing to their minor effect on gross thermodynamic 
properties, structural changes in liquids are ignored”. Non-ideal mixing of end- 
member liquid is required to fit Fe-FeS eutectic composition variation with pres- 
sure; see Supplementary Information for justification and parameterization. 

Velocities in the liquid are calculated from the adiabatic bulk modulus, K,, and 
density, p, at high P and T using the relation vp = \/K,/p. The adiabatic bulk 
modulus is obtained from the isothermal one using K, = K(1 + Tay). The core liquid 
Griineisen parameter, y, is 1.52 (ref. 30). The high-pressure thermal expansivity used 
here depends on the compressed volume“ via din(«)/dIn(V) = 67(V/Vo)", with 
K=14, 

Calculation of the adiabatic gradient in the core, dT/dz = ga/C, requires values 
for gravitational acceleration, g, and heat capacity, C,. We calculate gat any depth in 
the core by linear interpolation between its value at the CMB given by PREM’? and 
zero at the Earth’s centre. The high-pressure heat capacity is obtained from G)(P, T) 
through the thermodynamic identity C, = — T(@°G/@T”), evaluated numerically. 
Adiabatic compression of a homogeneous core liquid. When a homogeneous 
liquid is adiabatically compressed, P(r) = vz(r) varies approximately with radius 


as follows’: 
ae as +2 as 1+y—367| Tay 
gd \aP), ee 


Here 6 = —(1/aK)(0K/0T)» and K, y and « are the isothermal bulk modulus, the 
Griineisen parameter and the thermal expansion coefficient, respectively. K(P) is 
computed using the third-order Birch-Murnaghan equation. To demonstrate the 


smoothness of a profile of 1 — (1/g)d®/dr for a compressed homogenous liquid, 
we present an example that fits PREM well (Fig. 2c). The following parameters are 
used for the computation: Ky = 100 GPa, K’ = 4.75, po = 6,280 kg m *,y=0.8, 
Toms = 4,300K, dT/dr = —0.3Kkm ' and 6;= 3.33. The thermal expansion 
coefficient, x, decreases linearly from 5 X 10 °K | atthe CMBto3X10 °K ! 
at 1,700 km below the CMB. 
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Gene expression divergence recapitulates the 
developmental hourglass model 


Alex T. Kalinka!*, Karolina M. Varga'*t, Dave T. Gerrard’, Stephan Preibisch', David L. Corcoran’, Julia Jarrells'!, Uwe Ohler’, 


Casey M. Bergman? & Pavel Tomancak' 


The observation that animal morphology tends to be conserved 
during the embryonic phylotypic period (a period of maximal 
similarity between the species within each animal phylum) led to 
the proposition that embryogenesis diverges more extensively early 
and late than in the middle, known as the hourglass model’”. This 
pattern of conservation is thought to reflect a major constraint on 
the evolution of animal body plans’. Despite a wealth of morpho- 
logical data confirming that there is often remarkable divergence 
in the early and late embryos of species from the same phylum”, it 
is not yet known to what extent gene expression evolution, which 
has a central role in the elaboration of different animal forms*’, 
underpins the morphological hourglass pattern. Here we address 
this question using species-specific microarrays designed from six 
sequenced Drosophila species separated by up to 40 million years. 
We quantify divergence at different times during embryogenesis, 
and show that expression is maximally conserved during the 
arthropod phylotypic period. By fitting different evolutionary 
models to each gene, we show that at each time point more than 
80% of genes fit best to models incorporating stabilizing selection, 
and that for genes whose evolutionarily optimal expression level is 
the same across all species, selective constraint is maximized during 
the phylotypic period. The genes that conform most to the hourglass 
pattern are involved in key developmental processes. These results 
indicate that natural selection acts to conserve patterns of gene 
expression during mid-embryogenesis, and provide a genome-wide 
insight into the molecular basis of the hourglass pattern of develop- 
mental evolution. 

The notion that early development is similar among related animal 
species has been a guiding principle in comparative embryology since 
von Baer (1828) formalized the observation as his third law’. Darwin 
(1859) believed this to be the most compelling evidence in favour of com- 
mon descent, reasoning that adult life-stages will afford the greatest oppor- 
tunity for natural selection to operate, and thus adult structures should 
showsigns of species-specific adaptations more than earlier stages’’. These 
earlier stages, where adaptive opportunities are limited, will ultimately 
represent the ‘pruned’ but necessary features of ancestral differentiation”. 

Despite its intuitive appeal, the principle of early embryonic con- 
servation has not been supported by morphological studies’. Counter 
to the expectations of early embryonic conservation, many studies 
have shown that there is often remarkable divergence between related 
species both early and late in development, often with little apparent 
influence on adult morphology*”’. The extensive variation that is seen 
in early and late development is contrasted by a period of conserved 
morphology occurring in mid-embryogenesis. This is known as the 
phylotypic period because it coincides with a period of maximal 
similarity between the species within each animal phylum”. 

The morphological conservation evident in the phylotypic period 
motivated a proposal of the hourglass model’ as a revised formulation 


of von Baer’s third law. The hourglass model predicts that early and 
late divergence is separated by a ‘waist’ corresponding to the phylotypic 
period. One of these studies argues that an increase in the number of 
global interactions between genes and developmental processes during 
the phylotypic period renders any evolutionary modification highly 
deleterious due to their damaging side-effects”, whereas the other study 
views conservation during this period as a consequence of the need for 
precise coordination between growth and patterning, which is seen to 
be reflected in the genomic organization of the vertebrate Hox genes’. 

Support for the hourglass model has been found at the morpho- 
logical”"* and sequence levels'*’”. However, both the model and the 
concept of the phylotypic period remain controversial subjects in the 
literature*’*, with some studies of heterochrony in vertebrates indi- 
cating that divergence peaks at the phylotypic period”’ or that there is 
no temporal pattern of phenotypic conservation”. 

Although it is generally appreciated that gene expression divergence 
has a key role in the evolution of morphological diversity*”, no studies 
so far have addressed the extent to which expression divergence under- 
pins the morphological hourglass pattern at the genome-wide level. 
Here, we test the molecular basis of the hourglass model of develop- 
mental evolution using gene expression data from six Drosophila species 
with sequenced genomes (D. melanogaster, D. simulans, D. ananassae, 
D. persimilis, D. pseudoobscura and D. virilis), thereby enabling unam- 
biguous quantitative comparisons across orthologous genes for a set of 
species separated by up to 40 million years. Gene expression levels were 
measured for 3,019 genes, known to be expressed during embryonic 
development from RNA in situ data*’, at 2-h intervals for the majority of 
embryogenesis using a microarray time course with three biological 
replicates per species and four species-specific probes per gene (Sup- 
plementary Figs 1 and 2). 

For each gene in each species we generated a gene expression time 
course, corrected for differences in developmental time (Supplemen- 
tary Information, Section 2.2), and measured the correlation of the 
resulting temporal profiles for each pair of species (Fig. la, b). The 
distribution of the correlation coefficients shows that whereas most 
genes are positively correlated in their temporal expression, the diver- 
gence in embryonic gene expression follows the known phylogenetic 
relationships”. These results clearly demonstrate that there is evolu- 
tionary signal across the data set as a whole. 

To quantify gene expression divergence rigorously we fitted a linear 
model to the expression data. This approach enables us to quantify the 
divergence between species by measuring the influence that different 
species have on the expression of individual genes at specific times during 
development. We extract two different measures of divergence from the 
model: quantitative divergence, which reflects differences in expression 
across the whole time course; and temporal divergence, which reflects 
divergence of temporal profiles at specific time points (Supplementary 
Information, Section 2.6). We show that both of these measures of 
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Figure 1 | Gene expression during Drosophila embryogenesis recapitulates 
the known phylogeny. a, Between-species pairwise correlation coefficients for 
temporal profiles are depicted using a colour gradient. b, Species profiles are 
shown for two genes with both positive and negative correlations between 
different species pairs (Ahcy89E and Cyp6d5) and two genes that are temporally 
conserved (tara and eIF3-S9). Log, expression profiles are averaged over probes 


divergence recapitulate the known evolutionary relationships between 
the species when the phylogeny is constructed using all of the genes 
simultaneously (Fig. 1c, d). However, despite producing an identical 
topology to the known phylogeny, we see relatively long terminal 
branches in the phylogram, indicating that gene expression divergence 
does not scale with the amount of time separating pairs of species 
(Supplementary Fig. 3)”. 

If temporal expression divergence has saturated through time then we 
would expect to find a reduced capacity for reconstructing the known 
phylogeny at the level of individual genes. To explore this possibility we 
estimated the phylogenetic signal for each gene using a statistic that 
compares the observed phylogenetic signal to what would be expected 
under a process of random evolutionary change. A random evolution- 
ary process produces a phylogeny where closely related species resemble 
each other more than distantly related species as lineages inherit the 
random changes of their ancestors. The results show that at each time 
point the majority of genes exhibit a weaker phylogenetic signal than 
expected under random evolution (Supplementary Fig. 5a). 

Phylogenetic signal may be eroded by stabilizing selection”, and to 
test for this possibility we compared different evolutionary scenarios by 
fitting four alternative models to the expression data. The models were 
purely random evolutionary change and three stabilizing selection 
models where the optimal expression level may vary between groups 
of species, allowing us to model adaptive changes in expression 
(Supplementary Fig. 4)*°. The stabilizing selection models describe 
the change in expression as a combination of random changes and 
stabilizing selection curtailing the accumulating variance. The results 
show that at each time point at least 80% of genes fit best to models that 
incorporate stabilizing selection (Supplementary Fig. 5b). We also see 
that a substantial fraction of the genes fit best to models where there are 
adaptive changes in expression, indicating that a combination of both 
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and replicates. Selected correlation coefficients are shown on the plots, and the 
P-value refers to quantitative divergence. Time points along the x axis are 2-h 
intervals starting from 0-2h (1) and ending at 14-16h (8). c, The first three 
principal components for quantitative divergence. d, A maximum likelihood 
phylogeny based on temporal expression divergence across all genes. 


stabilizing and directional selection may be acting on a large fraction of 
the genes”. 

The variance between species in the behaviour of a particular gene at a 
particular time point provides a measure of the divergence of a gene’s 
temporal dynamics (see Methods). Plotting these values across all genes 
as a function of time shows that temporal expression divergence follows 
an hourglass pattern with maximal conservation occurring at time point 
5 (8-10h), a period that corresponds to the extended germband stage, 
generally regarded as the arthropod phylotypic period (Fig. 2a). We 
confirmed that the hourglass pattern is not an aggregate behaviour of 
the data set, but is present on a gene-by-gene basis for the majority of 
genes (Supplementary Figs 6 and 7), and also that this pattern is evident in 
the absolute, untransformed gene expression levels (Supplementary Fig. 
8b and Supplementary Information, Sections 2.6 and 2.7). For genes that 
fit best to models where the optimal expression level is the same across all 
species we calculate a measure of selective constraint” (Fig. 2b). This 
shows that for genes whose evolutionary optimum is the same across 
species, selective constraint is maximized during the phylotypic period 
when gene expression divergence is minimized. Therefore, natural selec- 
tion conserves gene expression patterns during the phylotypic period. 

To discover the functional classes of genes responsible for driving the 
hourglass pattern in the data, we correlated each gene’s divergence pro- 
file with the average across all genes, thereby allowing us to rank genes by 
their tendency to follow the global hourglass divergence profile. We find 
that these genes are enriched for biological processes involved in cellular 
and organismal development and gene expression (Supplementary 
Tables 1-3). Moreover, functional characterization of genes that follow 
an absolute expression hourglass (Supplementary Fig. 8b) shows that 
they are also enriched for developmental and gene expression processes 
(Fig. 3a and Supplementary Tables 4-6). Taken together, these results 
show that genes involved in core developmental processes conform 
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Figure 2 | Temporal expression divergence is minimized during the 
phylotypic period. a, Temporal divergence of gene expression at individual 
time points during embryogenesis. The curve is a second-order polynomial that 
fits best to the divergence data. Embryo images are three-dimensional 
renderings of time-lapse embryonic development of D. melanogaster using 
Selective Plane Illumination Microscopy (SPIM). b, Selective constraint for 


strongly to the global hourglass divergence pattern both in terms of 
temporal dynamics and absolute expression differences. 

We also asked whether there are sets of genes that don’t follow the 
global hourglass pattern and found genes enriched for processes 
involved in secondary metabolism, the immune system, and responses 
to oxidative and wounding stresses (Supplementary Fig. 9a and 
Supplementary Table 7). These are processes that are upregulated late 
in development, such as pigment or chitin metabolism, or processes 
that will be upregulated in response to changes that are independent of 
the developmental program, such as a change in the external envir- 
onment or the presence of a parasite. The transcript levels of genes in 
this latter category will reflect the particular challenges faced by indi- 
vidual embryos and so we would not expect these genes to follow a 
clear temporal pattern of conservation and divergence. These genes 
tend to be zygotically expressed and are largely present in the yolk 
(Supplementary Table 7). 
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genes that fit best to single optimum stabilizing selection models, calculated as 
the negative log of the equilibrium variance (see Methods and Supplementary 
Fig. 5b). Time points are 2-h intervals starting from 0-2 h (1) and ending at 14- 
16 h (8). Red diamonds indicate the mean; error bars encompass data within 1.5 
times the inter-quartile range, and the boxes show the lower and upper quartiles 
together with the median. 


Independent of the hourglass patterns, our measures of quantitative 
and temporal expression divergence exhibit similar functional associa- 
tions; housekeeping processes tend to be conserved and metabolic 
processes tend to be divergent (Supplementary Tables 8-11 and 
Supplementary Information, Sections 2.8 and 2.9). Given these broad 
functional similarities, it is of interest to ask whether genes in these 
categories of divergence also share similar genomic and gene-level 
features. We observe that genes that diverge quantitatively tend to 
have short introns and 5’ intergenic regions (Fig. 3b) whereas genes 
that diverge temporally have long introns and 5’ intergenic regions 
consistent with the notion that increased regulatory complexity in long 
noncoding regions” may provide opportunities for temporal expres- 
sion divergence (Fig. 3c). This increased regulatory complexity is also 
supported by a strong positive correlation between temporal diver- 
gence and tissue specificity. Additionally, temporal divergence is nega- 
tively correlated with mRNA length, raising the possibility that the 
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Figure 3 | Properties of genes with different divergence patterns. a, A 
neighbour-joining dendrogram of enriched functional processes for genes that 
follow an hourglass pattern of divergence. b, c, Correlation of gene-level 
variables with quantitative divergence (b) and temporal divergence (c). Error 
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proteins of these shorter genes are engaged in fewer protein-protein 
interactions. We also observe a positive correlation between rates of 
amino acid evolution (dN) and both quantitative and temporal diver- 
gence, supporting similar findings based on adult expression levels”, 
and providing further evidence that embryonic expression divergence 
is measuring biologically relevant signals. 

Our results show that gene expression is more resistant to evolu- 
tionary change during mid-embryogenesis than either early or late 
periods of Drosophila development. Evolutionary analyses support 
the notion that this conservation is the result of natural selection acting 
to maintain expression levels and their temporal relationships during 
mid-embryogenesis for genes involved in building up the body plan of 
the larva. These results complement a recent finding suggesting that 
the pupal stage in Drosophila is under strong stabilizing selection due 
to the complexity of the processes that occur during metamorphosis, a 
process that parallels many aspects of embryonic development”. These 
findings seem to support the hypothesis of ref. 2 that an increase in global 
interactions constrains evolutionary change of the phylotypic period; 
however, neither study directly addresses the coordination of growth 
and patterning proposed by ref. 1. Such a relationship may be best 
examined in the context of gene regulatory networks. Future studies will 
also need to address the mode and strength of selection acting on gene 
expression with greater resolution by coupling interspecific expression 
divergence with intraspecific variation during embryogenesis”. 


METHODS SUMMARY 


RNA was extracted from embryos from six Drosophila species (D. melanogaster, 
D. simulans, D. ananassae, D. persimilis, D. pseudoobscura and D. virilis) reared at 
25 °C. The embryos were aged at 2-h intervals to form a time course. Sixty-base- 
pair-long, species-specific microarray probes (four per species) were selected by 
choosing regions of the orthologous genes of each species that were maximally 
conserved according to an information entropy measure. Candidate probes with a 
G+C content higher than 50% were penalized and hence were less likely to be 
chosen. After scaling the time courses and normalizing replicates, the following 
linear model was fitted to log expression levels: 


log (yijatmn) = L+ Gi +S) + Tk +11; + Pinify + GSij + GTix + STix 


HG jy + PT nkcig) + Paci) + PT Kaj) + GSTipk + Cn (iki) 


where ju is the global average, G; is the gene effect, S; is the species effect, T; is the 
time effect, rj is the replicate effect nested in species, p,,.j) is the probe effect 
nested in genes and species, and €,,ijxim) is the residual error. Values are averaged 
over missing subscripts. Divergence per time point was measured as the between- 
species variance in GST values for each gene separately. We fitted four different 
evolutionary models to the GST values for each gene using the R package ‘ouch’ 
and ranked them by their Akaike Information Criterion (AIC). The models were 
Brownian motion plus three stabilizing selection models with between one and 
three selective optima (Supplementary Fig. 4). Genomic features of genes were 
retrieved from FlyBase release 5.14, adult expression level and tissue specificity 
were retrieved from FlyAtlas, and tissue expression data were retrieved from 
APOGEE (http://fruitfly.org/cgi-bin/ex/insitu.pl). Partial correlations were calcu- 
lated and 95% confidence intervals for each partial correlation were generated 
from 1,000 bootstraps. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Embryo collections and RNA isolation and labelling. Embryos were collected 
from a population of well-fed adults reared at 25 °C. To synchronize the age of the 
embryos in each sample we pre-laid the flies twice for 1 h with a fresh apple juice 
plate with yeast paste before every collection. Another fresh plate with yeast was 
used to collect embryos. The plate was removed from the cage after a 2-h interval 
and aged in the same incubator for the remaining time required by each time point. 
After ageing, embryos were collected and rinsed with water to remove yeast paste, 
and then dechorionated in 100% bleach for 2 min and then washed in desalinated 
water. The embryos were then transferred into a 1.5-ml tube and snap-frozen in 
liquid nitrogen and stored at —80 °C. 

When isolating RNA, embryos were thawed on ice and homogenized with a 
pellet pestle and a pellet pestle cordless motor (Kontes). RNA was isolated with the 
RNeasy Mini kit (Qiagen) and eluted with 30 jl of water. The RNA concentration 
was measured with the NanoDrop spectrophotometer and RNA quality was 
assessed with Bioanalyser using the Agilent RNA 6000 Nano kit. 

To prepare samples for hybridization to the chip, we followed the Agilent One- 

Colour Microarray-Based Gene Expression Analysis protocol version 5.5. The 
starting amount of RNA was normalized to 600 ng for all samples. Samples of a 
given time-course were processed on the same day. 
Probe selection. Probe selection was limited to 60-mers that started within 1 kb 
from the 3’ end of the transcript. The two main factors that influenced subsequent 
probe selection were the similarity of orthologous probes in six species determined 
by information entropy (Supplementary Fig. 1) and the specificity of a probe esti- 
mated by the G+C content-weighted BLAST score (Supplementary Information, 
Section 1.1). 

Additionally, we incorporated the distance from the 3’ end of the transcript into 
the information entropy measure by means of weighting the information entropy 
(Supplementary Fig. 2). Hence, the further away the candidate probe was from the 
end of the transcript, the higher the final information entropy measure became. 

Probe specificity was verified in two steps. We first rejected candidate probes 
that did not have a 60-nucleotide-long match to the respective genome assembly. 
By doing so we eliminated probes that fell on the border of two exons in the 
transcript sequence. For the remaining candidate probes the G+C content- 
weighted BLAST score was calculated. This score was the sum of nucleotides that 
were identical to the query 60-mer and were found in short, unspecific hits. The 
sum was weighted by the G+C content of hits shorter than 60 nucleotides. If the 
G-+C content exceeded 50% the sum was multiplied by a factor greater than 1 and 
as a consequence the probe was penalized. Four probes were selected for each gene 
in each species and we calculated the base-pair overlap between the probes and, 
where possible, tried to minimize this value (Supplementary Fig. 13). 
Time-course registration and correlation analysis. To register the time courses 
for different species with different developmental time periods*' onto a common 
time axis we scaled the non-melanogaster time courses to D. melanogaster by 
maximizing the similarity among the profiles across all genes. The selection of 
genes on the array resulted in a progressive shift of signal intensity distributions 
from bimodal (mixture of non-expressed and expressed genes early) to unimodal 
(most genes expressed late) across the time course (Supplementary Fig. 18). 
Therefore we normalized the replicates for each time point in each time course 
separately using quantile normalization. Next we averaged the probe signal intensities 
using the Tukey biweight algorithm to obtain a single expression value per gene and 
time point in each species while removing outliers. We then re-sampled each time 
course to 100 time points using cosine transform interpolation (DCT)**. 
Subsequently, all 3,019 expression profiles of the non-melanogaster species were 
scaled by factors ranging from 0.4 to 1.6 in 0.01 increments to find the optimal 
scaling factor. We calculated squared sums of average differences between all the 
scaled profiles and D. melanogaster profiles and plotted these sums as a function of 
the scaling factor applied (Supplementary Fig. 16). The global minimum in the graph 
corresponded to the scaling factor at which all the profiles of the two species were 
most similar to each other. We applied the optimal scaling factors (Supplementary 
Table 12) to the averaged non-melanogaster profiles with the DCT interpolation 
resulting in registered time courses (four example genes, before and after registration, 
are shown in Supplementary Fig. 17). 

To compare the overall shape of the profiles among species we row normalized 
the gene expression values for each gene in each time course and calculated pair- 
wise correlation coefficients for all pairs of orthologous genes. Genes ordered by 
this simple measure of similarity give an intuitive impression of the amount of 
conservation of temporal profiles among each pair of species (Fig. 1a). 

For the statistical analysis described below, we applied the optimal scaling 
factors to the raw log, Agilent array signal intensities and subsequently quantile- 
normalized each time point in each time course separately, as described above. 
Linear models. A global ANOVA model was fitted to the data to partition the main 
effect variables and interactions of biological interest from random factors and 
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residual error. This normalizes the gene expression values and provides a single 
coherent statistical framework in which to explore the variance and covariance 
structure of the data*’. The model for gene expression, Yijkimm» is a five-factor, 
partially nested, mixed-model ANOVA 


log (yijttmn) = H+ Gi + Sj + Tk +11(;) + Pmiii) + GSij + GTik + ST} 


+ 1Gijyi + PT nk ij) + TPj)m(iy) + PT Kaj) + GST je + en (ijn) 


where jis the global average, G; is the gene effect, Sj is the species effect, T;, is the time 
effect, rj) is the replicate effect nested in species, p,»,ii) is the probe effect nested in genes 
and species, and é, jk) is the residual error. Values are averaged over missing sub- 
scripts. The probe and replicate effects are random factors in the model and account 
for error variance arising from different probes and from different samples of within- 
strain genotypes respectively. 

The remaining terms are two- and three-way interactions between the main 
factors. The gene-by-species, GS; =log(yj) — G; — S; — ,andgene-by-species-by-time, 
GSTix. = log ( Vik) Gj —S; — T, — GSj — GT, — ST — pu, effects contain informa- 
tion about divergence between species. Here we treat time as a categorical variable 
so that we can extract variances at discrete time points in different species. 
Divergence at each time point is then measured as the between-species variance 
in GST values per gene and per time point (Fig. 2a). Mean sums of squares were 
estimated for each variable in the model after subtracting the mean from the data, 
and the resulting ANOVA table is shown in Supplementary Information, Section 
2.4. A Principal Component Analysis (PCA) of the gene X time (GT) effect from 
the above model was computed and the results are shown in Supplementary 
Information, Section 2.3. 

A reduced version of the above model was fitted as a linear regression to each 
gene separately (the gene effect was dropped) using the R package ‘limma’ version 
3.2.2 (ref. 34). Limma uses an empirical Bayesian approach to infer differential 
expression in individual genes, producing moderated t-statistics with Bayesian- 
adjusted denominators that incorporate information across the entire ensemble of 
genes”. By fitting a linear model to each gene separately, limma allows for gene- 
specific error distributions. The probe effect was also dropped from the ANOVA 
model as the probes were normalized using Tukey’s median polish method to fit a 
linear model for gene expression to each gene, yj = exp; + a; + ej, where exp; is the 
normalized gene expression value for gene i, a; is the probe effect for the jth probe, 
and ¢; is the residual error’’. The species effect from limma is equivalent to the GS 
effect from the global ANOVA, and this value was used for assessing quantitative 
divergence between species (Fig. 3b). 

The temporal profiles of genes were compared across species using a PCA-based 
approach*’. This method quantifies pairwise species differences in temporal pro- 
files for individual genes using the Mahalanobis distance, which is calculated using 
GST values for all time points estimated from the global ANOVA model. The 
Mahalanobis distance is calculated as 


Dj? =(AZ;—Zc)cov(AZ)~*(AZ;—Zc)', 


where AZ; is the species GST score contrast for gene i, Zc is the centroid for all of 
the GST score contrasts, and cov(AZ) is the covariance matrix for the difference 
matrix AZ. This metric is distributed according to a chi-squared distribution with 
k degrees of freedom where k is the number of principal components included in 
the contrast. We used the Mahalanobis distances as a measure of temporal diver- 
gence between species across all time points (Fig. 3c). The distances were calcu- 
lated using the first three principal components, which together account for 89% of 
the total variance. 

Phylogenetic analyses and evolutionary models. A maximum likelihood phylo- 
geny was constructed with GST values from every gene and every time point using 
the ‘contml continuous character restricted maximum likelihood approach 
implemented in PHYLIP version 3.69 (ref. 38) with D. virilis identified as the 
outgroup, and the resulting phylogram was plotted using Dendroscope version 
2.0 (ref. 39) (Fig. 1d). 

We estimated the phylogenetic signal for the GST values at each gene by cal- 
culating the K statistic described in ref. 24 using the R package ‘picante’ version 
1.1-1. The tree used for this purpose was a phylogram based on median dS values 
for ~10, 000 orthologous genes*® which was then converted to a chronogram in 
the R package ‘ape’ version 2.5-1*!. 

Ornstein-Uhlenbeck (OU) and Brownian motion models were fitted to the six 
species-specific GST values for each gene at each time point using the R package 
‘ouch’ version 2.6-1 (ref. 25). The OU models fitted to each gene describe evolu- 
tionary change in a trait X over an infinitesimally small increment of time as 
dX(t) =a(0—X(t))dt+odB(t) where dB(t) describes Brownian motion (inde- 
pendent and identically distributed normal random variables with mean 0 and 
variance df), o is the strength of Brownian motion, « is the strength of stabilizing 
selection, and @ is the trait optimum’. Under a purely Brownian process of 
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evolutionary change the first term on the right-hand side is absent. This model was 
extended by ref. 25 to include branch-specific values for 0, thereby allowing for 
adaptive evolution along specific branches. We fitted four models to each gene: 
Brownian motion plus three OU models with between one and three stabilizing 
selection optima (Supplementary Fig. 4), based on the chronogram mentioned 
above. Here we did not engage in an exhaustive model fitting endeavour as our 
intention was to demonstrate two things: (1) models incorporating stabilizing 
selection fit best to the majority of the genes, and (2) models incorporating adaptive 
changes in trait optima often out-perform non-adaptive models. 

To avoid treating time points as if they are independent we fitted models to 
subsets of time points. These subsets were chosen by bootstrapping (1,000 boot- 
straps) hierarchical clusters of time points based on GST values for each gene 
separately using the R package ‘pvclust’ version 1.2-1 (ref. 43) and selecting clusters 
of time points with P-values below 0.05. This approach allows different modes of 
selection to operate across different periods of each gene’s time course. 

For each gene the model that showed the best fit to the data was defined as the 
model with the lowest Akaike Information Criterion (AIC), calculated as 2k — 
2log(likelihood) where k is the number of degrees of freedom in the model in 
question. AIC scores balance the likelihood of a model against its complexity (the 
number of parameters in the model). 

After ranking models by their AIC scores, genes for which Brownian motion 
was not ranked first were tested to see if the top-ranked model showed a signifi- 
cantly better fit to the data than Brownian motion using a log-likelihood ratio test. 
The resulting P-values were adjusted using the Benjamini-Hochberg false discovery 
rate correction in the R package ‘multtest’ version 2.4.0 (ref. 44) and models with 
adjusted P-values above 0.05 were dropped down into the Brownian category. We 
then repeated this process, but treating single optimum models as the null model 
and testing models that ranked best with two or three optima against this null model 
to ensure that the resemblance to the phylogeny for these genes was not the result of 
chance under a single optimum across all species. If single optimum models showed 
a better fit then they were, in turn, tested against the Brownian model. 

We extracted a measure of selective constraint from the genes that fitted best to 

single optimum models (Fig. 2b), calculated as the negative log of the equilibrium 
variance, r (ref. 23). 
Gene Ontology and tissue expression enrichment. Gene Ontology (GO) ana- 
lyses were conducted using the R package ‘topGO’ version 1.14.0 (ref. 45). Three 
enrichment methods were used. For genes that were ranked by a real number score 
(such as a correlation coefficient) a Kolmogorov-Smirnov ranking test was applied 
and GO terms with distributions among the genes that showed significant departure 
from a uniform distribution in a particular direction were deemed to be enriched. 
Unranked sets of genes were tested for enriched GO terms using the ‘elim’ and 
‘parent-child’ algorithms in topGO. The ‘elim’ algorithm decorrelates the local GO 
graph structure to take into account local dependencies between terms so that more 
biologically relevant terms are enriched* and the ‘parent-child’ algorithm controls 
for the inheritance bias between parent and child terms in the GO hierarchy”*. 
Fisher’s exact test was then used to determine enrichment P-values for both of these 
algorithms. The same approach was used to identify enriched tissue expression 
terms from a controlled vocabulary based on in situ expression data”! by using 
modified code from the topGO package. 

P-values from Kolmogorov-Smirnov tests were adjusted using the Benjamini- 
Hochberg false discovery rate correction, but no correction was applied to the 
‘elim’ and ‘parent-child’ P-values because they are not calculated independently 
for each GO term in these algorithms and are effectively already adjusted. For 
defined sets of genes, the reference set was all of the genes on the chip. 

We plotted a neighbour-joining tree of enriched, non-redundant GO terms” for 

Fig. 3a using the R package ‘ape’ and Dendroscope. Terms were enriched for 1,188 
genes that show an hourglass profile in both temporal dynamics and absolute 
expression levels using Fisher’s exact test and selecting terms with adjusted 
P-values below 0.05. 
Correlation of divergence with gene-level variables. Quantitative and temporal 
divergence measures were generated for each of the 15 pairwise species com- 
parisons. Following ref. 48, we converted these to nine branch lengths on the 
known phylogeny using the Fitch-Margoliash least squares method (implemented 
in the PHYLIP program ‘fitch’**). Negative branch lengths were set to zero. Total 
expression divergence for each gene is the sum of branch lengths and constitutes 
our ‘quantitative’ and ‘temporal’ measures using the limma or Mahalanobis dis- 
tances, respectively. 


We collated structural, functional and expression data for all of the genes on the 
chip from public databases and previous genome-level studies. These data were 
generated from gene coordinates retrieved from FlyBase Release 5.14 (January 
2009). Only protein-coding genes were retained (as all genes on our chip are 
protein coding) and genes from the heterochromatic portions of the otherwise 
“euchromatic chromosome arms were discarded (168 genes from the genome, 
including 25 from our chip data set)”. 

In addition, data on further variables for 8,500 D. melanogaster genes compiled 
by ref. 48 were obtained from the authors. These data could be assigned to 2,526 of 
the genes on the chip. Gene expression was described by adult expression level and 
tissue specificity (both from FlyAtlas’), expression divergence between adults 
(measured in a very similar set of species by ref. 22), and we added the mean 
embryonic expression level from our own data. Gene sequence evolution was 
described by codon bias (the frequency of optimal codons) in D. melanogaster 
and by dN and dS, the rates of non-synonymous and synonymous nucleotide 
substitutions, respectively. 

As many of our variables of interest were correlated with one another, we 
calculated partial correlations between each variable and expression divergence 
while controlling for the other variables. The set of variables included are described 
in Supplementary Information, Section 1.3. We only used filtered genes for which 
we had information on all the variables (n = 1,832). Partial correlations were 
calculated from Spearman’s rank correlation matrices using the R package ‘corp- 
cor’. Ninety-five per cent confidence intervals for each partial correlation were 
generated by boot-strapping (random sample with replacement) the set of genes 
contributing to the correlation. One thousand bootstraps were performed using 
the R package ‘boot’. 
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A phylogenetically based transcriptome age index 
mirrors ontogenetic divergence patterns 


Tomislav Domazet-Loso!” & Diethard Tautz! 


Parallels between phylogeny and ontogeny have been discussed for 
almost two centuries, and a number of theories have been proposed 
to explain such patterns’. Especially elusive is the phylotypic stage, 
a phase during development where species within a phylum are 
particularly similar to each other’*. Although this has formerly 
been interpreted as a recapitulation of phylogeny’, it is now 
thought to reflect an ontogenetic progression phase’, where strong 
constraints on developmental regulation and gene interactions 
exist”*. Several studies have shown that genes expressed during this 
stage evolve at a slower rate, but it has so far not been possible to 
derive an unequivocal molecular signature associated with this 
stage’"'*. Here we use a combination of phylostratigraphy’® and 
stage-specific gene expression data to generate a cumulative index 
that reflects the evolutionary age of the transcriptome at given 
ontogenetic stages. Using zebrafish ontogeny and adult develop- 
ment as a model, we find that the phylotypic stage does indeed 
express the oldest transcriptome set and that younger sets are 
expressed during early and late development, thus faithfully mir- 
roring the hourglass model of morphological divergence”. 
Reproductively active animals show the youngest transcriptome, 
with major differences between males and females. Notably, ageing 
animals express increasingly older genes. Comparisons with sim- 
ilar data sets from flies and nematodes show that this pattern 
occurs across phyla. Our results indicate that an old transcriptome 
marks the phylotypic phase and that phylogenetic differences at 
other ontogenetic stages correlate with the expression of newly 
evolved genes. 

The evolutionary origin of genes can be traced by similarity searches 
in genomes representing the whole tree of life. We have called this 
approach ‘phylostratigraphy’ and have shown that meaningful com- 
parisons can be derived from it'*’* (see Supplementary Note 1). It is 
important to note that the procedure identifies specifically the origin of 
novel genes with no traceable relation to existing genes or protein 
domains (see Supplementary Note 2). Another important property 
of phylostratigraphy is that it establishes a phylogenetic scale where 
every gene within a genome has its phylogenetic rank. Here, using this 
phylogenetic hierarchy, we extend this approach by linking it to all 
expressed genes within the ontogenetic sequence. To link these two 
hierarchies quantitatively we developed a transcriptome age index 
(TAI), which integrates the age of a gene with its expression level at 
a given developmental stage and sums this over all genes expressed at 
the respective stage. The higher the TAI, the younger the transcrip- 
tome (see Methods). 

To apply the TAI for a developmental model system, we have 
generated a fine-grained series of transcriptome data of zebrafish 
development, covering a total of 60 stages, from unfertilized eggs to 
ageing animals. Figure 1a shows the TAI profile, plotted along these 
stages. The comparatively oldest transcript sets are expressed during 
the late segmentation/early pharyngula stage, which is the develop- 
mental stage that is usually equated with the phylotypic stage in zebra- 
fish'’. The start of heart pulsations and blood circulation in the embryo 


(24h)” is a morphological feature that approximately marks this 
period of lowest TAI values. Phylogenetically younger transcriptome 
sets are expressed before and after this stage. This correlates well with 
the observation that early and late stages of chordate development also 
show a higher morphological divergence between taxa’. During the 
mid-larval stage we see a second phase where older transcriptomes are 
expressed, which corresponds to metamorphosis’’. Although meta- 
morphosis in fish is not as overt as in some other chordates (for 
example, amphibians), it is nonetheless a phase with major changes 
in morphology and life-history strategy. It is particularly evident in the 
reshaping of the fins, which change from a basal pattern that is seen 
across all fish into the one that is more specific for zebrafish”. After 
this stage, the transcriptome becomes younger and peaks in young 
adults. Males and females show major differences in the overall age 
index, with females expressing the relatively youngest genes. 
Intriguingly, as animals become older, they express older genes again. 

Analysing the contribution of the different phylostrata (ps) to the 
general profile shows that they contribute to different extents 
(Fig. 1b). Genes that have emerged before the evolution of metazoa 
(ps1 to ps5) are more equally expressed throughout ontogeny, whereas 
later-emerging genes contribute increasingly to the differential pattern. 
A more detailed contribution of the genes from the different phylostrata 
is summarized in Fig. 2. Here we have depicted the relative expression 
levels for each stage for several phylostrata. This representation is only 
partly comparable to that in Fig. 1b, as it disregards the actual number of 
genes within a phylostratum. But this analysis allows several more spe- 
cific points to be made. 

Most genes that have arisen in ps1 (cellular origin) are general 
enzymes and housekeeping genes, but their RNA is not highly 
expressed before gastrulation (Fig. 2a), indicating that the products 
of these genes are primarily stored as proteins in the egg. Intriguingly, 
this is very different for the genes from ps2 to ps4, which have their 
relatively highest expression levels at these early stages. This is also 
indicative of a correlation between phylogenetic age of a gene and 
ontogenetic use of its product. 

The noticeable TAI peak during gastrulation (Fig. la) is mainly 
generated by the genes from ps5 (evolution of metazoa, Fig. 2b). 
Studies in sponges suggest that gastrulation is an embryological pro- 
cess present since the onset of the metazoan evolution’, which is in 
agreement with the peak of ps5 genes. In addition, we have previously 
identified ps5 as the time of emergence of genes involved in cellular 
interactions’*, which are evidently of particular importance during 
gastrulation. 

Genes that have evolved during chordate evolution (ps9) are par- 
ticularly highly expressed at the end of the pharyngula stage and at the 
beginning of larval stages, before metamorphosis (Fig. 2b). This is 
again a very suggestive correlation, because during this phase the 
chordate body plan in zebrafish reaches, for the first time, a full func- 
tional differentiation that is reflected in chordate-specific undulatory 
swimming and the start of active feeding. Interestingly, ps7 genes 
(evolution of bilateria) start to be strongly expressed at the beginning 
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Figure 1 | Transcriptome age profiles for the zebrafish ontogeny. 

a, Cumulative transcriptome age index (TAI) for the different developmental 
stages. The pink shaded area represents the presumptive phylotypic phase in 
vertebrates. The overall pattern is significant by repeated measures ANOVA 

(P=2.4X 10 1°, after Greenhouse-Geisser correction P = 0.024). Grey 
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shaded areas represent ~ the standard error of TAI estimated by bootstrap 
analysis. b, Transcriptome indices split according to the origin of the genes 
from the different phylostrata, based on the same developmental series as in 
a. c, Depiction of the phylostrata analysed; numbers in parentheses denote the 
number of array probes analysed for each phylostratum. 


of metamorphosis, raising the possibility that metamorphosis-related 
genes or processes have already originated in parallel to the formation 
of bilateria. Ancient origins of hormonal signalling processes asso- 
ciated with metamorphosis have indeed been proposed”. 

Although comparable fine-grained data sets are currently not avail- 
able for other model systems, one can still compare the trends based on 
available partial data sets. A good developmental transcriptome data 
series exists for Drosophila, although it covers only one-third of the 
expressed genes”’. We have calculated the TAI for these data and find 
that the overall pattern is indeed comparable to zebrafish (Fig. 3). Most 
notably, the relatively oldest transcriptome is expressed during germ- 
band elongation, which can be equated to the phylotypic phase in 
arthropods™. Thus, this molecular signature is qualitatively compar- 
able to the zebrafish data, but there are more novel genes among the 
post-embryonically expressed genes in Drosophila than in zebrafish, 
reflected in larger TAI values from differentiation stages onwards 
(Fig. 3a). Again, we see a major difference between males and females 


Figure 2 | Relative expression of the genes from each phylostratum across 
the zebrafish ontogeny (same stages as in Fig. 1) for selected phylostrata 
with significant differences. See Supplementary Fig. 3 for representation of all 
phylostrata and significance assessments. For easier comparisons, the relative 
expression calculated in relation to the highest (0) and lowest (1) expression 
values across developmental stages is shown (see Methods). Bl, blastula; Cl, 
cleavage; G, gastrula; H, hatching; Juv, juvenile; Ph, pharyngula; Se, 
segmentation; Z, zygote. 
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Figure 3 | Transcriptome age profiles for the Drosophila ontogeny, based 
on the data in ref. 23. a, Cumulative transcriptome index for the different 
developmental stages. The pink shaded area represents the presumptive 
phylotypic phase in insects. The overall pattern of differences in TAI is 


after hatching, but in contrast to zebrafish, the males express the 
younger transcriptomes. Similar to the situation in zebrafish, ageing 
Drosophila express increasingly older transcriptomes (Fig. 3a). 
Breaking this pattern down to the contribution from the different 
phylostrata shows again that the oldest genes contribute little to the 
differential pattern, whereas genes that have emerged in ps9 (equival- 
ent to the evolution of Arthropods) and later add increasingly to the 
final profile (Fig. 3b). 

Comparable, but even more limited, ontogenetic transcriptome data 
are also available for the nematode Caenorhabditis elegans* and the 
mosquito Anopheles”’. The same trends can be seen for those as well, 
namely the oldest genes expressed during the embryonic stages, the 
youngest towards adult stages and older genes in ageing animals 
(Supplementary Figs 1 and 2). 

These consistent overall patterns across phyla, as well as the detailed 
analysis within zebrafish, suggest that there is a link between evolu- 
tionary innovations and the emergence of novel genes'®”””*. 
Adaptations are expected to occur primarily in response to altered 
ecological conditions. Juvenile and adults interact much more with 
ecological factors than embryos, which may even be a cause for fast 
postzygotic isolation”. Similarly, the zygote may also react to environ- 
mental constraints, for example, via the amount of yolk provided in the 
egg. In contrast, mid-embryonic stages around the phylotypic phase 
are normally not in direct contact with the environment and are there- 
fore less likely to be subject to ecological adaptations and evolutionary 
change. As already suggested by Darwin (discussed in ref. 15), this 


Cell. orgx[ 2 (83) 
(1,034) 


significant by repeated measures ANOVA (P = 2.5 X 10 °°, after Greenhouse- 
Geisser correction P = 1.22 X 10°"). Grey shaded areas represent + the 
standard error of TAI estimated by bootstrap analysis. b, c Same as for Fig. 1b, c. 


alone could explain the lowered morphological divergence of early 
ontogenetic stages compared to adults, which would obviate the need 
to invoke particular constraints. Alternatively, the constraint hypo- 
thesis would suggest that it is difficult for newly evolved genes to 
become recruited to strongly connected regulatory networks'*”*"». 

The fact that ageing animals revert to older transcriptomes is in line 
with the notion that animals beyond the reproductive age are not 
‘visible’ to natural selection and can therefore not be subject to specific 
adaptations any more. Also, the fact that major TAI differences can be 
seen between males and females could have been anticipated, because 
sexual selection is expected to continuously change phenotypic traits 
between them. However, the fact that the differences go in opposite 
directions in zebrafish and Drosophila is surprising. We have therefore 
studied in detail which phylostrata contribute most to these differences 
(Fig. 4). Both taxa show a female expression bias of ps2 genes, which 
may be correlated to egg production, as RNA from such genes is stored 
in the eggs (see above). But they strongly deviate at other phylostrata. 
Zebrafish shows a strong female bias of ps6 and ps12 genes, which is 
absent in Drosophila (Fig. 4). Drosophila, on the other hand, shows an 
extreme bias of ps14 genes in males (Fig. 4), which is caused by the 
many orphan genes involved in spermatogenesis*’. Thus, in contrast to 
the ontogenetic similarities of the TAI trends between the two taxa, the 
sex differences are rather incongruent and indicate different evolution- 
ary trajectories for male-female differences. 

Our study provides strong molecular support for a correlate 
between phylogeny and ontogeny, as well as the hourglass model of 
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Figure 4 | Comparison of differences in TAI between females and males. 
Comparison across phylostrata in zebrafish (Danio) and Drosophila (see 
Supplementary Fig. 4 for a plot that includes the differences between stages). 
The grey shaded area designates the shared part of the phylogeny between the 
two species (origin of the first cell until the last common ancestor of Bilateria, 
psl-ps7). Note that the ps14 value for Drosophila is off the scale (difference is 
given in parenthesis). 


development. Under this scheme, the phylotypic phase can be defined 
as the ontogenetic progression during which the oldest gene set is 
expressed, either because this is the phase with the lowest opportunity 
for lineage-specific adaptations, or because it is internally so con- 
strained that newly evolved genes cannot become integrated. 


METHODS SUMMARY 


The TAI is the weighted mean of phylogenetic ranks (phylostrata) and is calcu- 
lated for every ontogenetic stage s as follows: 


> psiei 
TAI, = = 


n 


ei 
i=1 


where ps; is an integer that represents the phylostratum of the gene i (for example, 
1, the oldest; 14, the youngest), e; is the microarray signal intensity value (obtained 
from Agilent Zebrafish (V2) Gene Expression Microarrays) of the gene i that acts 
as weight factor and n is the total number of genes analysed. This way of calculating 
the index gives an increasingly stronger weight to younger phylostrata, thus com- 
pensating for the fact that the older phylostrata usually harbour the larger number 
of genes'*"*. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Fish keeping and sampling conditions. Zebrafish (Danio rerio) were kept in 121 
flow-through tanks at 26.5 °C (around 60 animals per tank). For accurate staging, 
fertilized eggs were collected within 15-min intervals and incubated in Petri dishes 
at 28.5 °C with water changes every 2-6 h. After hatching, larvae were transferred 
to 1-l tanks and kept at 28.5 °C. We took, in total, 72 samples in two replicates that 
correspond to 60 stages across zebrafish ontogeny (50 samples before the sex could 
be clearly recognized plus 11 samples of males and females each). Staging was done 
according to post-fertilization time. Embryos were additionally staged under the 
dissecting microscope according to ref. 31 to check for the consistency of post- 
fertilization timing and morphological development at standard temperature 
(28.5 °C). Only healthy animals that showed the expected morphological features 
for a given post-fertilization time were sampled. Each sample contained around 50 
individuals until the 1 day and 3 h embryo stage, 15 individuals until the 10 day 
larval stage, 10 individuals until the 18 day larval stage, 5 individuals until the 
45 day juvenile stage, whereas in later juvenile and adult stages we sampled males 
and females separately and each sample contained 2 individuals. All samples were 
snap frozen in liquid nitrogen and stored at —80°C until RNA extraction. To 
avoid severe biases owing to the excess of unfertilized eggs, we squeezed eggs from 
adult females before freezing them in liquid nitrogen. 
Phylostratigraphy. A full account of phylostratigraphic analysis and theoretical 
underpinnings has been presented previously'®'*. The zebrafish genes of the 
present study (28,546, ENSEMBL release 56) were mapped on the currently best 
supported phylogeny using BLAST searches against the cleaned up and addition- 
ally enriched NCBI NR database, which represents the most exhaustive set of 
known proteins across all organisms. Our choice of internodes (phylostrata) in 
the consensus phylogeny depended on the availability of complete annotated 
genomes, reliability of phylogenetic relationships and on the importance of evolu- 
tionary transitions. Similarly, the data of Drosophila (13,389 genes), Anopheles 
(12,457 genes) and Caenorhabditis elegans (19,077 genes) were mapped to the best 
supported phylogenies that represent their evolutionary lineages. 
RNA isolation and microarray gene expression experiments. Total RNA was 
isolated using the TRIZOL plus protocol (Invitrogen). Four-hundred nanograms 
of total RNA per sample were Cy3 labelled according to the one-colour Quick 
Amp Labelling Kit protocol (Agilent). Labelled CRNAs were hybridized to Agilent 
Zebrafish (V2) Gene Expression Microarray slides (4 X 44k) for 17h at 65 °C and 
washed according to the Agilent protocol. Hybridized microarray slides were 
scanned using an Agilent High-Resolution Microarray Scanner. 
Microarray data extraction, filtering and analysis. Raw microarray image files 
were processed and quality checked by Agilent’s Feature Extraction 10.7 Image 
Analysis Software. Background subtracted signal intensity values that contain cor- 
rection for multiplicative surface trends (gProcessedSignal) generated by Feature 
Extraction Software were used for further data analysis. Using GeneSpring micro- 
array data analysis software we filtered probes that were flagged as non-uniform or 
as population outlier. For every of the 72 samples we calculated average signal 
intensity values over the two biological replicates. Probes (60 bp) were mapped 
on the Danio rerio transcripts (ENSEMBL version 54) that passed the phylostrati- 
graphic analysis (see below) using CD-hit software. This procedure yielded 16,188 
unique probes that collapsed to 12,892 ENSEMBL predicted genes. 
Phylostratigraphically mapped genes of Drosophila were linked to available 
microarray data**. This procedure yielded a data set of 3,550 genes. In a similar 
fashion, phylogenetically ranked microarray data sets were obtained for C. ele- 
gans” (16,832 genes) and Anopheles*® (3,135 genes). 
Transcriptome age index and statistical analysis. The TAI is the weighted mean 
of phylogenetic ranks (phylostrata) and is calculated for every ontogenetic stage s 
as follows: 


where ps; is an integer that represents the phylostratum of the gene i (for example, 1, 
the oldest; 14, the youngest), e; is the microarray signal intensity value (obtained from 
Agilent Zebrafish (V2) Gene Expression Microarrays) of the gene i that acts as weight 
factor and n is the total number of genes analysed. This way of calculating the index 
gives an increasingly stronger weight to younger phylostrata, thus compensating for 
the fact that the older phylostrata usually harbour the larger number of genes'*"*. 
We chose to calculate the TAI index based on the amount of expression per 
gene, rather than by simply adding up whether a gene is expressed or not. Although 
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this latter approach would also seem feasible, it runs into a technical problem. To 
say that a given gene is expressed or not, one would have to impose a cutoff on the 
signals from the microarrays, which is more or less arbitrary, as a weak signal on a 
microarray could be derived from a gene with very low expression level, or from a 
highly expressed gene that is present in a few cells only. Also, absolute quantities 
are difficult to compare across microarrays and a single cutoff value would not be 
appropriate (see below). In balance, we have therefore opted for the expression 
level as a numerator, also because one could argue that genes that are broadly 
expressed at high levels should be more relevant than specialized genes. 
The TAI formula can alternatively be written as: 


€2 en 


Fes ps, 
* ey Heeb se, rP "ep tert...te, 


The expression e, + e, +...+ e, represents the total signal of the analysed probes on 
the microarray, whereas the ratio e;/(e, + e. +...+ e,), which can be denoted as fj, 
represents the partial concentration (frequency) of probe i in the total microarray 
signal at a given stage; it is within a range between zero and one. It is important to 
note that the calculation of the partial concentration (f;) inherently makes a global 
intensity normalization over the microarray experiment at a given stage and that at 
every stage the sum of partial concentrations will equal one. In many microarray 
studies it is common to assess the direction of expression change (over- or under- 
expression). This type of analysis requires that after the normalization procedure, 
which aims to remove noise from the experiment, expression signals that are mea- 
sured across experiments still reflect absolute number of mRNA molecules per unit 
of biological material. In such situations, if global intensity normalization is applied, 
it must be assumed that the total number of mRNA copies for all genes on the array 
does not significantly differ between experiments. Contrary to this common applica- 
tion of microarrays, in our study we are not interested in the direction of expression 
change of particular genes. Instead, we are looking at how partial concentrations of 
RNAs contribute to the overall transcriptome across stages. For this purpose it is 
irrelevant which part of the transcriptome is responsible for change of the partial 
concentration. Therefore in our data treatment it is not necessary to assume that 
cumulative signals do not differ between experiments. This shift in perspective 
greatly simplifies the analysis on the scale of the complete ontogeny because abund- 
ance and distribution of transcripts is commonly very different between stages. 

Thus, the TAI can be written as a sum of products between partial concentration 
and corresponding phylostratum: 


e] 
TAI s + ps: 
sa LPT eT 


TAI; = y. psifi=psifi + ps2fa +... + PSnfn 
i=1 
To asses the contributions of a specific phylostratum to the overall TAI (Figs 1b 
and 3b) we split the above total sum of psj; products to subsets of psf; sums where 
the value of ps; (phylostratigraphic rank) was used as a grouping factor. 

By applying repeated measures ANOVA on these psjf; products we tested the 
significance of difference in TAI between stages. Repeated measures ANOVA was 
used because the same set of probes are measured at every stage, that is, there is 
dependence between the stages compared. Before means of these products across 
stages are compared by ANOVA we multiplied every ps;jf; product with constant n 
(total number of analysed probes). This transformation does not influence the 
ANOVA analysis and its sole purpose is that means of psjf; products compared in 
ANOVA are equal to the corresponding TAI values. Because the assumption of 
sphericity was violated in the data sets analysed by repeated measures ANOVA, we 
applied the Greenhouse-Geisser correction. Multivariate test statistics, an alterna- 
tive approach that is not dependent on the assumption of sphericity, corroborated 
our statistical results of repeated measures ANOVA. We used the bootstrap 
approach (1,000 replicates) to asses the standard error of weighted mean (TAI). 

Relative expression of the genes for a given phylostratum (ps) and devel- 
opmental stage (s) (Fig. 2) was calculated according to the equation: 


RE(ps),= =n 
f max =f, min 
where f is the average partial concentration of RNAs from phylostratum ps for a 
given stage and fmax, fmin are the maximal and minimal average partial concentra- 
tion from phylostratum ps across all considered stages, respectively. 


31. Kimmel, C. B., Ballard, W. W., Kimmel, S. R., Ullmann, B. & Schilling, T. F. Stages of 
embryonic development of the zebrafish. Dev. Dyn. 203, 253-310 (1995). 

32. Efron, B. & Tibshirani, R. Bootstrap methods for standard errors, confidence 
intervals, and other measures of statistical accuracy. Stat. Sci. 1, 54-75 (1986). 
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Interdependence of behavioural variability and 
response to small stimuli in bacteria 


Heungwon Park', William Pontius’, Calin C. Guet?, John F. Marko’, Thierry Emonet? & Philippe Cluzel? 


The chemotaxis signalling network in Escherichia coli that controls 
the locomotion of bacteria is a classic model system for signal trans- 
duction’’. This pathway modulates the behaviour of flagellar 
motors to propel bacteria towards sources of chemical attractants. 
Although this system relaxes to a steady state in response to environ- 
mental changes, the signalling events within the chemotaxis net- 
work are noisy and cause large temporal variations of the motor 
behaviour even in the absence of stimulus’. That the same signalling 
network governs both behavioural variability and cellular response 
raises the question of whether these two traits are independent. 
Here, we experimentally establish a fluctuation-response relation- 
ship in the chemotaxis system of living bacteria. Using this relation- 
ship, we demonstrate the possibility of inferring the cellular 
response from the behavioural variability measured before stimu- 
lus. In monitoring the pre- and post-stimulus switching behaviour 
of individual bacterial motors, we found that variability scales lin- 
early with the response time for different functioning states of the 
cell. This study highlights that the fundamental relationship 
between fluctuation and response is not constrained to physical 
systems at thermodynamic equilibrium‘ but is extensible to living 
cells’. Such a relationship not only implies that behavioural vari- 
ability and cellular response can be coupled traits, but it also pro- 
vides a general framework within which we can examine how the 
selection of a network design shapes this interdependence. 

It is standard procedure to characterize the stochastic dynamics of 
physical systems in thermodynamic equilibrium by measuring spon- 
taneous fluctuations and responses to small external perturbations. 
Because these two distinct measurements contain the same informa- 
tion, they are related by the fluctuation-dissipation theorem*. Although 
the fluctuation-dissipation theorem has practical applications—to 
evaluate force-extension sensors for single biomolecules®”’ and to pre- 
dict static cell-to-cell variability of gene expression®”—it has not been 
possible to apply it directly to the study of the dynamical behaviour of 
living cells because they are open systems with significant non- 
thermal dynamics. However, this theorem has recently been extended 
to a fluctuation-response theorem (FRT) for systems that are not in 
thermodynamic equilibrium but that have a well-defined steady state 
and Markovian dynamics*'®. For application to living cells this con- 
dition amounts to studying dynamic processes with sufficiently short 
‘memory’ that they can relax to a well-defined steady state. Here we use 
the FRT as an operational framework to establish the interdependence 
of distinct cellular traits, such as cellular fluctuations and response to a 
small stimulus, without relying on the biochemical details of a specific 
signalling pathway. To tackle this question experimentally, we used the 
well-characterized chemotaxis system in E. coli, which governs bacterial 
locomotion”. 

The chemotaxis network regulates the rotation direction—clockwise 
(CW) or counter-clockwise (CCW)—of the flagellar motors, which 
control the swimming direction of the cell”. One of the hallmarks of 


bacterial chemotaxis is adaptation. Following a stepwise stimulus, the 
CW bias (the probability that the motor will rotate clockwise) decreases 
abruptly, before slowly adapting back to its pre-stimulus level. Even 
when bacteria are adapted to their environment, the CW bias of indi- 
vidual cells fluctuates around the mean. These temporal fluctuations in 
CW bias reflect slow fluctuations in signalling events throughout the 
transduction network". To verify that the bacterial chemotaxis system 
satisfies the FRT, we monitored both the temporal fluctuations of the 
CW bias before stimulus and the cellular response to a small stimulus at 
the single-cell level. Both quantities were obtained from the time series 
of CW and CCW intervals of individual motors from bacteria immo- 
bilized on a glass coverslip’ and submerged in a motility medium that 
does not support growth. Such single-cell experiments are complicated 
by inherent cell-to-cell differences in relative chemotaxis protein con- 
centration, leading to differences in switching dynamics (Fig. 1a). To 
compare cells with similar behaviour, we sorted wild-type cells accord- 
ing to their steady-state CW bias (Fig. 1a). These CW bias bins define 
different classes of cells, which, despite being genetically identical, have 
different dynamics and must be analysed separately’. 

First, we quantified the response in single cells by measuring the 
length of successive CCW intervals immediately following the stimu- 
lus. The stimulus (10 nM of aspartate) used in this study is small and 
close to the limit of sensitivity of the bacterial chemotaxis system’®. At 
the single-cell level, the length of the first CCW interval following the 
small stimulus (Supplementary Fig. 1a) was distributed around the 
mean CCW interval length before stimulus (Supplementary Fig. 1b). 
Given that CCW interval length is a stochastic variable, we averaged 
the CCW interval lengths after stimulus between cells and found that 
the mean length of the first CCW interval following stimulus was 
slightly longer than the mean pre-stimulus CCW interval length 
(Fig. 1b). Therefore, we expected the response of the system to be 
within the linear regime, which was necessary to apply the FRT. We 
also tested the response of the chemotaxis system for a stimulus 100 
times larger (1 UM aspartate). Surprisingly, the second CCW interval 
following the stimulus returned to near pre-stimulus length for both 
large and small attractant concentrations (Fig. 1c). Although the 
cellular response to stimulus extends in some cases beyond the second 
interval (Supplementary Fig. 1d, e), these results qualitatively indicate 
that the first CCW interval contains most of the chemotactic response 
to both small and large stimuli. 

To characterize the system quantitatively, we defined the response 
time of a single cell as the cumulative length of post-stimulus CCW 
intervals that are strictly longer than the mean CCW interval length 
before stimulus (Fig. 1b, c and Supplementary Fig. le; see Methods for 
definition of response time). This procedure yields a reasonable estimate 
of the response time under the condition of small stimulus (Supplemen- 
tary Fig. 2). We found that the response time averaged over CW bias 
bins decreased with CW bias for both small (Fig. 2a) and large stimuli 
(Fig. 2a, inset). Because all cells returned to their pre-stimulus behaviour 
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Figure 1 | CCW interval lengths pre- and post-stimulus. a, Histogram of 
CW bias of wild-type RP437 cells. We sorted cells into CW bias intervals by 
their pre-stimulus CW bias: 0.00-0.05 (A), 0.05-0.10 (B), 0.10-0.15 (C), 0.15- 
0.20 (D), 0.20-0.25 (E), 0.25-0.30 (E), 0.30-0.40 (G), 0.40-0.50 (H) and 0.50- 
0.60 (I). Grey bars are cells representative of wild-type behaviour. To increase 
the chance of obtaining cells with CW bias higher than 0.2, we transformed 
wild-type cells with pZE21-CheR (Methods). This extended the range of CW 
bias considered in our study to values greater than 0.4: bins H and I (not 
shown). b, c, The first (b) and second (c) mean post-stimulus CCW interval 
lengths versus pre-stimulus CW bias for all cells (wild-type RP437 and RP437 
expressing CheR from pZE21-CheR). (See Supplementary Fig. 1 for individual 
cells.) Black circles, cells exposed to a small stimulus (10 nM L-aspartate). Grey 
triangles, cells exposed to a large stimulus (1 .M L-aspartate). Error bars show 
the standard error associated with the average CCW interval length in each 
bin. Dark grey dashed line, geometric mean of the CCW interval lengths 
following a randomly chosen time point in non-stimulated cells. Black line, 
power-law fit of the geometric mean of pre-stimulus CCW interval lengths 
calculated over 1,500 for all cells (wild-type RP437 and RP437 expressing 
extra CheR from pZE21-CheR) as a function of the pre-stimulus CW bias 
(Supplementary Fig. 1b). 


(Supplementary Fig. 1), the system exhibited near-precise adaptation at 
the single-cell level, regardless of CW bias (Supplementary Fig. 3). This 
result agrees with that obtained from population measurements'”* and 
shows that the dynamics have sufficiently short ‘memory’ and that 
individual cells can relax to a well-defined steady state. 

A direct consequence of the linear approximation is that the res- 
ponse time of the system to a small external stimulus should be pro- 
portional to the correlation time of the spontaneous fluctuations 
before stimulus. Using serial correlation analysis’’’®, we evaluated 
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Figure 2 | Relationship between response to stimulus and fluctuations 
before stimulus. a, Average response time for all cells (wild-type RP437 and 
RP437 expressing extra CheR from pZE21-CheR) exposed to a stepwise small 
stimulus (black circles, 10 nM L-aspartate) or large stimulus (grey triangles in 
inset to a, 1 1M L-aspartate). The letters correspond to the CW bias bins 

(Fig. 1a). Error bars show the standard error associated with the average 
response time within each bin. b, Average response time to a small stimulus 
(black circles) or large stimulus (grey triangles in inset to b) as a function of the 
correlation time for all cells (wild-type RP437 and RP437 expressing CheR from 
pZE21-CheR). For the large stimulus, the average response time was adjusted 
by a correction factor (Supplementary Fig. 2c). The solid lines are linear fit 
functions forced through the origin. For the black line: response 

time = C X correlation time. C ~ 0.98 + 0.10 (R* = 0.75). For the grey line in 
the inset: relaxation time = C X correlation time. C ~ 12.23 + 1.83 (R” = 0.07). 
Error bars for the correlation time are the half-lengths of the first uncorrelated 
CCW intervals. Error bars for the response time are the standard error 
associated with the average response time within each bin. Grey area, 
representative behaviour of a wild-type population. Insets in a and b share axes 
with the main panels. 


the correlation time in non-stimulated cells (Supplementary Fig. 4). 
In agreement with our assumption of linear dynamics”' and the general 
prediction of the FRT, we found that the correlation time scales linearly 
with the response time to small stimulus (R? = 0.75; Fig. 2b) whereas to 
large stimulus it scales poorly (R* = 0.07; Fig. 2b, inset). This result has 
an important practical implication: The response time that governs the 
cellular response in chemotaxis can be experimentally inferred by 
measuring the temporal correlations in behavioural fluctuations from 
cells before stimulus. 

Cellular behavioural variability can also be defined by the amplitude 
of the noise rather than its temporal correlations. To characterize the 
amplitude of the output noise of the chemotaxis network, we com- 
puted the power spectral density of the switching binary time series 
measured from individual motors before stimulus (Fig. 3a and Sup- 
plementary Fig. 5). We evaluated the low-frequency noise by integrat- 
ing the power spectrum between f= 1/1,500s ‘and f=1/10s ‘In 
this frequency range, the temporal fluctuations are putatively caused 
by the slow methylation-demethylation of the receptor-kinase com- 
plexes that are also controlling the adaptive process’*. Two elements 
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Figure 3 | Low-frequency noise in non-stimulated cells. a, Low-frequency 


noise in individual wild-type RP437 cells (black) and RP437 cells expressing 
CheR from pZE21-CheR (grey) versus CW bias. The inset shows power spectral 
density as a function of noise frequency. Black line, power density averaged over 
all cells (wild-type RP437 and RP437 expressing CheR from pZE21-CheR) with 
CW bias = 0.15-0.20. Dark grey line, power density of the motor decoupled 
from the signalling network’. We determined the low-frequency noise for the 
region between the dotted lines. See Supplementary Fig. 5 for all CW bias bins. 
b, Signalling noise as a function of CW bias for wild-type RP437 cells and 
RP437 cells expressing CheR from pZE21-CheR. Signalling noise is defined as 
the variance Cen aes of the fluctuating [CheY-P]. Letters correspond to the CW 
bias bins (Fig. 1a). The power spectral densities and CW biases are averaged 
over cells within the same CW bias. Error bars show the standard error 
associated with the estimated signalling noise within each bin. 


contribute to the observed output noise: the spontaneous noise asso- 
ciated with the signalling events of the chemotaxis network and the 
stochastic switching behaviour of the bacterial motor (Fig. 3a). The 
binary nature of the switching behaviour of the motor dominates the 
variance of the noise and masks the signalling noise within the 
chemotaxis network the output signalling molecule of which is the 
phosphorylated form of the signalling protein CheY'*. The active 
form, CheY-P, binds to the sensory basal part of the flagella rotary 
motor and induces CW rotation. Using a procedure developed by ref. 
22, we decoupled the signalling noise, O° chey_p» from that of the motor. 
We then found that the signalling noise decreased with the CW bias 
(Fig. 3b). 

Operationally, we used a simplified expression of the FRT, in which 
the response function of the chemotaxis system p(t) and the auto- 
correlation function C(t) of the spontaneous fluctuations of the cellular 


behaviour should be related by p(t) = —K a C(t). Here, the fluctuation- 


response coupling coefficient K may depend on the genetic background, 
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Figure 4 | Relationship between signalling noise and response time to a 
small external stimulus. a, Mean coupling coefficient 1/(K(a)),, for each 
CW bias bin. We computed the geometric mean over frequencies ranging 
from 1/1,500s | to 1/20s 1, represented by the dashed lines in the inset to 
a. We found that the coupling coefficient K for the small stimulus was constant 
at long timescales for frequencies in this range (see also Supplementary Fig. 
7a). The standard error of the mean is smaller than the symbol size except for 
the highest CW bias bin I. The line is the mean value of 1/(K()),, computed 
over CW biases ranging from 0.00 to 0.5. The inset to a shows 1/K() for cells 
with a CW bias ranging from 0.15 to 0.20 (10 nM L-aspartate increase). For 
large stimulus K is not constant (see Supplementary Fig. 7b). b, Average 
response times of all cells (wild-type RP437 and RP437 expressing inducible 
CheR) to small stimulus (black circles) or large stimulus (grey triangles in inset 
to b) versus mean pre-stimulus signalling noise. Solid lines are linear fits 
forced through the origin. Response time = C x o2),.y.p- Black line: 
C= 259+ 25suM ~ (R* = 0.8) for small stimulus. Grey line in inset to 
b: C= 3,215 + 307s uM? (R? = 0.4) for large stimulus. Grey area, 
representative behaviour of a wild-type population. The insets in b shares axes 
with the main panel. c, The correlation time as a function of the mean 
signalling noise before stimulus for all cells (wild-type RP437 and RP437 
expressing CheR from pZE21-CheR). Black line, linear fit function forced 
through the origin. Correlation time = C x o2),.y-p. C= 257 £21suM 7 
(R? = 0.9). Letters correspond to the CW bias bins (Fig. 1a). Error bars for the 
correlation time are the average half-lengths of the first uncorrelated CCW 
intervals. Error bars for the signalling noise are the standard error associated 
with the signalling noise in each bin. Grey area is representative behaviour of a 
wild-type population. 
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growth conditions, and functional state of the cell. We plotted the 
2Im|[f()| 
wP(w) 
P(@) is the power spectral density of the spontaneous fluctuations 
(Fig. 4a and Supplementary Figs 6 and 7). In the most general non- 
equilibrium case, the coupling coefficient K may change when the 
genetic background or the growth conditions are modified. In chemo- 
taxis, we found that the value of the coupling coefficient K(q) is 
independent of the functioning states of the cell and levels of expres- 
sion of the chemotaxis proteins (Fig. 4a). This result is remarkable 
because most of the chemotaxis network has highly nonlinear signal 
processing”***, 

It is usual to consider that noise is an independent limiting factor in 
intracellular signalling and that evolution selects network designs to 
reduce it”. However, using the framework of the FRT, we asked 
whether the temporal fluctuations in the switching rate of the motor 
and the cellular response are ever dynamically coupled. Remarkably, 
we found that the response time to a small external stimulus scaled 
linearly with the signalling noise from the chemotaxis network in cells 
before stimulus (R* = 0.8; Fig. 4b), which was consequently linearly 
related to the correlation time (R? = 0.9; Fig. 4c). Furthermore, we 
found that the response time to a large stimulus scaled poorly 
(R* = 0.4) with the signalling noise, reflecting that for large stimulus, 
the system operates outside the regime of linear approximation 
(Fig. 4b, inset). 

We interpret this observation in simple mathematical terms, where 
the fluctuations in the network output, dchey-p, about its average 
have linearized kinetics in the form of a Langevin equation”'”*: 
© Senet =— “Scher + VDon(t), where VD6én(t) is a white-noise 


source with intensity D and t is the measured correlation time in the 
output of the signalling system. In this coarse-grained picture, there 
should exist a strict relationship between the signalling output noise 
amplitude ocyey-p and the time t, where Gers =(D/2)t. Although 
the coefficient D could potentially depend on intracellular parameters 
in a complex way, our experiments surprisingly showed that two cel- 
lular traits, o°Chey-p and the response time, are linearly coupled. This 
observation implies that the coefficient D remains approximately con- 
stant over a wide range of functioning states of the cell (that is, CW 
bias). This result is consistent with the fact that the coefficient (K(@)),,, 
(Fig. 4a) determines the behaviour of D, because (K(@)),,0c1/D. 
Consequently, we anticipate that below an upper bound imposed 
mainly by rotational diffusion”, cells with the largest behavioural 
variability before stimulus would also exhibit the strongest chemotactic 
drift in response to an external stimulus”. 

Although the FRT predicts the existence of a coupling between 
cellular response and noise, it does not specify how this coupling 
depends on the different states of the cell. Therefore, we hypothesize 
that the specific design of the signalling pathways could govern such 
interdependence. We find that a simple kinetic model and experi- 
mental data support this hypothesis (Supplementary Fig. 8): in che- 
motaxis, the value of the coefficient D is governed by the adaptation 
mechanism that uses the classic futile cycle’! as a core module in which 
two antagonistic enzymes regulate the activity of the kinase-receptor 
complexes. Because the futile cycle is a design shared by a large class of 
signalling pathways*'”*”, it raises the possibility that for these systems, 
noise and cellular response are coupled in a similar way. To gain 
general insights into the selection of a specific coupling, we should 
examine how certain classes of design and function of networks may 
constrain the behaviour of this interdependence”. 


coefficient K(w)= — as a function of CW bias, where 


METHODS SUMMARY 

Response time. For each cell (whose behaviour is defined by a specific CW bias 
bin), the response time was measured from the time of stimulus through all 
successive averaged CCW intervals that were longer than the mean pre-stimulus 
CCW interval length. This mean was obtained by averaging together the CCW 
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interval lengths chosen at random time points within the binary time series of the 
non-stimulated cell. 

Correlation time. To determine the correlation time of the CCW sequences, we 
used serial correlation coefficients (Supplementary Fig. 4c) for the CCW interval 
lengths’’”°. We converted the correlated number of sequences to the real correla- 
tion time lengths, including the half-length of the first uncorrelated CCW interval. 
To determine whether the sequences in each lag (the number of preceding CCW 
intervals) were correlated, we used the Wilcoxon rank sum test (the “ranksum” 
Matlab function) at a significance level of P = 0.01 (Supplementary Fig. 4d), as in 
ref. 20. We considered the first non-zero lag that had h = 0 as the end of the 
correlation. 

Low-frequency noise and motor noise. We define the low-frequency noise N/" 
of the ith cell as the integrated power density P;(f) of the binary time series from 


i 
fi=1/1,500s | to fy= 1/108 ', which is N/"= | P;(f)df (Fig. 3a). We define 


fi 
the low-frequency motor noise N}"™ as the integrated flat baseline of the power 


density (Fig. 3a, dark grey line) on the same timescale. We estimated signalling 
noise from the average experimental power spectral density, the average CW bias, 
and the gain function between the input signal (steady-state [CheY-P]) and output 
signal (average CW bias) using methods introduced by ref. 22 (Methods). 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Strains and plasmids. RP437 is a wild-type E. coli strain for chemotaxis*'. To 
construct pZE21-CheR, we amplified cheR using polymerase chain reaction (PCR) 
from the chromosome of the RP437 strain with the following primers: CheR- 
KpnI-5’: 5'-gce get acc atg act tca tca tct ctg ccc tg-3’ and CheR-HindIII-3’: 
5'-cgc aag ctt tta atc ctt act tag cgc at-3'. The gene fragment was inserted in the 
KpnI and HindIII sites of a pZE21 series plasmid*° that contained a kanamycin 
resistance cassette and a TetR inducible promoter. The plasmid pZS4-Int1 encodes 
tetR under a constitutive promoter, which modulates the expression of the TetR- 
regulated cheR construct”. This plasmid carries a spectinomycin resistance gene. 
Wild-type cells with and without plasmid exhibited similar noise levels (Fig. 3a) 
and CCW interval lengths after stimulus (Supplementary Fig. 1a, c and d) at the 
single-cell level. 

HPLC calibration of the release of aspartate. We prepared 10 ul samples of 
0.5-mM caged L-aspartate solution under the same conditions for the chemotaxis 
experiments and illuminated them with intense ultraviolet light from a Xenon 
flash lamp (built-in L7685 reflective mirror, 60 W, Hamamatsu). We estimated the 
relative concentration of the caged L-aspartate in each sample by the high- 
performance liquid chromatography (HPLC) peak area. By comparing the 
decreasing HPLC peak area with its initial peak area, we found the released 
L-aspartate concentration as a function of the number of ultraviolet flashes (Sup- 
plementary Fig. 9). The samples released about 1 1M L-aspartate per ultraviolet 
flash. The HPLC gradient conditions had five steps: (1) equilibrium with 20% 
acetonitrile, 0.1% TFA/80% water, 0.1% TFA; (2) gradient of 20-55% acetonitrile 
over 30 min; (3) first washing with 55-90% acetonitrile for 20 min; (4) second 
washing with 90% acetonitrile for 5 min; and (5) equilibrium with 20% acetonitrile, 
0.1% TFA/80% water, 0.1% TFA. 

Photo-release and single-cell assay. We sheared the flagella of the cells by slowly 
forcing them through a thin needle (inner diameter = 0.19 mm, 27 G 2, B-D) 40 
times. Cultures grew overnight in 3 ml of tryptone broth at 35 °C with shaking at 
200r.p.m. We transferred the overnight cultures to a 250 ml flask, in which we 
diluted them 1:50 in 12-ml tryptone broth and grew the cells again at 35°C at 
200 r.p.m. To obtain cells with different CW biases, we induced plasmid expres- 
sion with various concentrations of anhydrotetracycline (0-2.5ngml_‘) in the 
diluted overnight cultures. The media also contained the antibiotic specific to the 
plasmid. We harvested the cells when the absorbance A reached ~0.3 at 600 nm. 
We washed the cells and resuspended them in motility medium (0.1 mM EDTA, 
0.1 mM L-methionine, 10 mM potassium phosphate pH 7.0). We prepared glass 
slides (No. 1/2, 18 mm, Corning) coated with poly-L-lysine and a solution of beads 
(Polybead Amino 1.0 um Microspheres, Polysciences) coated with rabbit antibodies 
against flagella. We mixed the cells (4-5 il) with the beads (4-5 ul) and incubated 
them for 20 min at room temperature (21-22 °C). This process caused the cell bodies 
to stick to the glass slide and the beads to attach to the flagella. Although the 
probability of a bead attaching to a rotating flagellum was low, we consistently 
obtained a few labelled flagella in each sample. After incubation we removed the 
unattached cells and beads and then added 8 pl of 5M (for small stimulus) or 
500 11M (for large stimulus) caged L-aspartate solution to the sample medium. We 
covered the sample with oil (immersion oil transparent to ultraviolet: type FF, 
Cargille Laboratories) to prevent evaporation. We placed the sample under a 
dark-field condenser to produce a bright red image of the bead. Harmful blue light 
was filtered out by a long-pass filter (NT52-543, Edmund Industrial). We observed 
the samples under an Olympus IX71 microscope with an oil immersion objective 
100 X (numerical aperture = 1.3, Olympus Uplan FI, oil iris ©/0.17). We recorded 
the long circular motions of individual beads attached to rotating flagella of single 
cells through a four- quadrant photomultiplier (type: R5900U-01-M4, Hamamatsu). 
The signal from the photomultiplier, a four-voltage time series, was monitored with 
a PC computer via LabView software (National Instrument). The rotation of the 
bead was simultaneously recorded using a charge-coupled device camera (1/3"’ 
midresolution Exview digital B/W camera, Sony). We converted the signal to a 
binary time series indicating transitions between CCW and CW rotations. After 
1,500s (or 300s) of recording the rotational motion of the bead, we photo- 
released the caged aspartate (caged l-aspartic acid, sodium salt (189110): N-[1- 
(2-nitrophenyl)ethyloxycarbonylJaspartic acid, sodium, C,3H;3N2Os° Na, relative 
molecular mass 348.2 and molar absorption ¢ = 4,710 M'cm7! at maximum 
wavelength Aiax = 264nm), from Calbiochem or synthesized by D. Trentham, 
G. Reid and J. Corrie). We illuminated the sample with an intense ultraviolet light 
from the Xenon flash coupled into a light guide (A2873, quartz glass fibre, 
Hamamatsu) and widely focused onto the whole sample with two ultraviolet- 
coated lenses (focal length = 35 mm and diameter = 25.4 mm; focal length = 20 
mm and diameter = 12.7 mm, ThorLabs). These ultraviolet flashes produced a 
stepwise release of 1 AM (or 10 nM) L-aspartate from the 0.5 mM (or 5 uM) caged 
L-aspartate*’. The magnitude of the stepwise stimulus corresponds to the typical 


increase in attractant concentration encountered by bacteria swimming in a gra- 
dient of 1nM um! (refs 34 and 35). 

Definition of CW bias. We define Ti and oral as the durations of the jth CW 
and CCW intervals of the ith cell. The CW bias for the jth CW-to-CCW interval 
pair of the ith cell is b,j = Ti / Gg + 7 . The pre-stimulus CW bias of the 
gd is the time average of b;; over a time window of length fi,hefore 
preceding the stimulus. t;, before Was 300 s for the cells with CW bias exceeding 0.25 
responding to the large stimulus and 1,500 s for all other cells. Similarly, the post- 


stimulus CW bias of the ith cell, (6, i) ae is the temporal average of b;; over a 
“/ after 


time window of duration f;, afte, seconds following the stimulus. For the small (or 
large) stimulus, the first two (or 200) CW-CCW interval pairs following stimulus 
were not included. fj, after Was 1,500 s for small stimuli, 900 s for large stimuli and 
CW bias <0.25, and 300s for large stimuli and CW bias >0.25. 

Response time. For each cell (the behaviour of which is defined by a specific CW 
bias bin), the response time was measured from the time of stimulus through all 
successive averaged CCW intervals that were longer than the mean pre-stimulus 
CCW interval length. This mean was obtained by averaging together the CCW 
interval lengths chosen at random time points within the binary time series of the 
non-stimulated cell. If the response time included more than one CCW interval, 
the CW interval length between two successive CCW intervals was also included in 
the response time. To get the final response time, we subtracted the mean non- 
stimulated portion of the first responding CCW interval. For example, if the third 
CCW interval is the last CCW interval length significantly longer than the mean 
CCW interval length before stimulus (dashed line in Figs 1b and c), the response 
time would be: 


(Tcew, 1st) + (Tew, ist) + (Teew, 2na) + (Tew, ana) + 


(Tcew, 3ra) — (Tcew, 1st, prestimulus ) 


The dashed line in Fig. 1b and c and Supplementary Fig. le represents the trend of 
the mean pre-stimulus CCW interval length in each CW bias bin. Because of the 
presence of a few outliers, we used the geometric mean to compute the trend of the 
mean CCW interval lengths after stimulus and mean pre-stimulus CCW interval 
length within each CW bias bin (Fig. 1b and c). 

Correlation time. To determine the correlation time of the CCW sequences, we 
used serial correlation coefficients (Supplementary Fig. 4c) for the CCW interval 
lengths'”’°. We converted the correlated number of sequences to the real correla- 
tion time lengths, including the half-length of the first uncorrelated CCW interval. 
To determine whether the sequences in each lag (the number of preceding CCW 
intervals) were correlated, we used the Wilcoxon rank sum test (the “ranksum” 
Matlab function) at a significance level of P = 0.01 (Supplementary Fig. 4d) as in 
ref. 20. We considered the first non-zero lag that had h = 0 as the end of the 
correlation. 

Low-frequency noise and motor noise. We define the low frequency noise Nj" of 
the ith cell as the integrated power density P,(f) of the binary time series from 


fi 
fi=1/1,500s | to fy= 1/108 |, which is Ni" = | P;(f)df (Fig. 3a). We define 


the low-frequency motor noise N/"™ as the integrated flat baseline of the power 
density (Fig. 3a, dark grey line) on the same timescale. 
Estimating signalling noise. To estimate the signalling noise, we used a formula 


2 2 2q2 _%Chey-P 
om, total = ou t &M b 


5 which shows the relationship between the vari- 
[CheY-P| 
ance Gepey-p Of [CheY-P] and the variance o%, jo. Of the output signals. This 
formula was derived from a model recently introduced to describe generally the 
gain-noise relationship between the input and output signals in the chemical 
reaction network”. As ref. 22 showed, the temporally fluctuating output signal 
from a well defined steady state (CW bias = b) due to the fluctuating input signal 
([CheY-P]) is described by the following linearized chemical Langevin equation: 
5b =yy5[CheY-P] — 6b/tw + u(t), where 5b and 6[CheY-P] are small devia- 
tions of the CW bias and [CheY-P] from their steady values, respectively, ty is the 
typical timescale of the motor alone and €)(f) is the Gaussian white-noise term 
that satisfies €y(t)=0 and €y(t)Ey(t’) = a2, 8(t— t'). From this equation, we 
obtain the total variance of the output signals due to the temporally fluctuating 
input signals and the Gaussian white noise: 
b = 
on, total ee 2E 


oChey-P 
T + TChey-P [CheY-P] 


TCheY-P 


where [CheY-P] is the steady value of fluctuating [CheY-P] values given by: 


5 \ Na 
CheY-P] = Ky (—= 
ee “(oa 
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where Ky, (half the concentration of CheY-P that yields CW bias = 0.5) and 
the Hill coefficient Ny are given by 3.1 4M and 10.3, respectively, in ref. 15). 


The constant Oy, in the first term is defined by Oy =2yy,[CheY-P| / oz,” and b 
is the CW bias. gy is the gain function defined as the ratio of the fractional 
change of the output signal to the input signal: that is, gy=(5b/b) / 


(8{Chey-P| /[CheY-P)) =Ny(1—b), where Ny(1—6) is obtained from ref. 


15. Tchey-p is a characteristic timescale of the [CheY-P] fluctuations and is 
2 
OF as 


: 2 . 2 7 = _ 2) 
proportional to the input noise o@,.y.p as follows: tchey-p = 2 OChey-P* 


This relationship is derived from the chemical Langevin equation describing 
the [CheY-P] fluctuations from its steady state ([CheY-P}): 


8[CheY-P] 


6[CheY-P] = — 


+ €chev-p(t) 


where Echey-p(t) is a Gaussian white-noise term that satisfies Ecyey-p(t) =0 and 


Echey-p(t) Echev-r(t) =02,, 


enough, the response time to the stimulus should scale to tcney-p. For the broad 
range of the functioning states of this paper, we have one condition, tchey-p>>tm 
in the timescales involved in this system. Under this condition, the above formula 
for the total variance of the output signals can be simplified to 


‘8(t—t'). As long as the external stimulus is small 


2 2 ay 9 ch Y-P 
OM, total = Im +8M 6 —S = 
[CheY-P] 
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where on, ota 8 given by b(1—b) for any binary time series and is equal to the 
integral of the power spectral density over all frequencies (black line in 
Supplementary Fig. 5) averaged over all cells (wild-type RP437 and RP437 expres- 
sing CheR from pZE21-CheR) and oj, is equal to the integral of the power density 
(dark grey line in Supplementary Fig. 5) of the isolated motor. We approximated 
the baseline of the motor power density by finding the mean value of the flat regime 
(from f; = 1/10s | to f= 1/5s_') of the average experimental power density and 
extending the baseline to the lowest frequency. By using the simplified formula 
above, we estimated the o2),.y-p values in each CW bias bin (Fig. 3b). 
Definition of noise. We hypothesize that a small number of proteins and thermally 
activated biochemical reaction rates cause stochastic fluctuations between func- 
tional states of signalling proteins. Operationally, we monitor the cellular behaviour 
in a motility medium that does not support growth but allows bacteria to perform 
chemotaxis. Under these conditions, the observed noise does not result from protein 
synthesis or degradation; rather, it results from fluctuations in protein functional 
states about a well-defined steady state. 
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Tumour vascularization via endothelial 
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Glioblastoma is a highly angiogenetic malignancy, the neoformed 
vessels of which are thought to arise by sprouting of pre-existing 
brain capillaries. The recent demonstration that a population of 
glioblastoma stem-like cells (GSCs) maintains glioblastomas’* 
indicates that the progeny of these cells may not be confined to 
the neural lineage’. Normal neural stem cells are able to differentiate 
into functional endothelial cells*. The connection between neural 
stem cells and the endothelial compartment seems to be critical in 
glioblastoma, where cancer stem cells closely interact with the vas- 
cular niche and promote angiogenesis through the release of vas- 
cular endothelial growth factor (VEGF) and stromal-derived factor 
1 (refs 5-9). Here we show that a variable number (range 20-90%, 
mean 60.7%) of endothelial cells in glioblastoma carry the same 
genomic alteration as tumour cells, indicating that a significant 
portion of the vascular endothelium has a neoplastic origin. The 
vascular endothelium contained a subset of tumorigenic cells that 
produced highly vascularized anaplastic tumours with areas of vas- 
culogenic mimicry in immunocompromised mice. In vitro culture 
of GSCs in endothelial conditions generated progeny with pheno- 
typic and functional features of endothelial cells. Likewise, orthoto- 
pic or subcutaneous injection of GSCs in immunocompromised 
mice produced tumour xenografts, the vessels of which were 
primarily composed of human endothelial cells. Selective targeting 
of endothelial cells generated by GSCs in mouse xenografts resulted 
in tumour reduction and degeneration, indicating the functional 
relevance of the GSC-derived endothelial vessels. These findings 
describe a new mechanism for tumour vasculogenesis and may 
explain the presence of cancer-derived endothelial-like cells in 
several malignancies. 

From archival material, we selected a group of glioblastomas showing 
both remarkable angiogenesis and nuclear accumulation of mutant p53 
in tumour cells (Supplementary Table 1). In 83.3% (20/24) of these 
tumours, we found cells with nuclear accumulation of mutant p53 that 
lined the lumens of capillaries and/or vascular glomeruli (Supplementary 
Fig. la and Supplementary Table 1). Double immunohistochemistry 
analysis of p53 and CD31 demostrated the endothelial phenotype of 
the p53-positive cells facing the lumen of the vessels (Supplementary 
Fig. 1b). Mouse and human tumour-associated endothelial cells can 
harbour chromosomal alterations’®*. To assess whether a subset of 
endothelial cells showed glioblastoma-specific chromosomal aberra- 
tions, we analysed the tumour vasculature in 15 glioblastomas by com- 
bined CD31 immunofluorescence and fluorescence in situ hybridization 
(FISH) using probes for the centromere of chromosome 10 (Cep10), for 
the telomere of chromosome 19 (Tel19q), and a locus-specific probe on 
chromosome 22 (breakpoint cluster region locus q11.2; LSI22). In all the 
tumours carrying aneuploidy for one or more of these chromosomes, we 
detected a substantial fraction of endothelial cells bearing the same 


chromosomal aberrations (Supplementary Fig. 1c). Interestingly, double 
immunostaining of vascular glomeruli in glioblastoma revealed a signifi- 
cant number of GFAP* microvascular cells showing an aberrant 
endothelial/glial phenotype (Supplementary Fig. 1d). Thus, a variable 
number of endothelial cells in glioblastoma seem to originate from the 
tumour. To quantify the contribution of tumour-derived endothelial 
cells to glioblastoma vasculature, we used FISH to analyse purified 
CD31*/CD144~ (VE-Cadherin* ) endothelial cells from freshly disso- 
ciated glioblastoma specimens (Fig. 1a). Again, we detected CD31*/ 
CD144* endothelial cells that shared the same chromosomal altera- 
tions as the tumour cells in any given glioblastoma harbouring aberra- 
tions of chromosomes 10, 19 and 22 (Fig. 1b and Supplementary 
Table 2). The amount of endothelial cells with tumour-specific 
chromosomal changes ranged between 20 and 90% of the sorted cells 
(mean 60.7 + 28.1 standard deviations (s.d.)). 

We assessed further the phenotype of sorted CD31*/CD144" glio- 
blastoma cells by immunofluorescence, which showed that the vast 
majority of these cells (83.9 + 4.2%; range 79-90%) expressed the 
mature endothelial cell marker von Willebrand factor (VWF), although 
a substantial proportion of them (30.9 + 21.3%; range 10-76%) co- 
expressed vWF and GFAP (Fig. 1c, d). Thus, it seems that the CD31 +4 
CD144* cells harbouring chromosomal aberrations are glioblastoma- 
derived endothelial cells that either differentiated towards the canonical 
endothelial lineage (GFAP ) or showed a mixed endothelial/glial 
phenotype (GFAP*), whereas the euploid fraction is likely to represent 
endothelial cells derived from normal brain vessels. In vitro experi- 
ments using a microvascular culture of fresh CD31~ cells isolated 
by magnetic microbeads from glioblastoma samples confirmed the 
existence of endothelial cells with aberrant GFAP expression (Sup- 
plementary Fig. 2a—c), as well as the presence of a substantial number 
of aneuploid endothelial cells (Supplementary Fig. 2d, e). Grafting of 
freshly purified CD31*/CD144* cells showed that three of five glio- 
blastomas contained tumorigenic endothelial cells that produced highly 
vascularized anaplastic tumours (Supplementary Fig. 3a—c). These cells, 
however, lost their tumorigenic activity on in vitro culture with 
endothelial medium (Supplementary Fig. 3a). 

Although there is no general agreement on the definiton and markers 
identifying so-called cancer stem cells, there is good evidence that GSCs 
can be enriched by the use of anti-CD133 antibodies or through the 
generation of clusters of undifferentiated cells (neurospheres) in serum- 
free media containing epidermal growth factor (EGF) and basic fibro- 
blast growth factor (FGF)'*”"*"*. We demonstrated recently that GSCs 
can differentiate into mesenchymal cells, giving rise to osteoblastic and 
chondrocytic cells’. To determine the potential contribution to the 
angiogenic process of GSCs, we cultivated glioblastoma neurospheres 
and primary glioblastoma differentiated cells under endothelial condi- 
tions, or CD133*/CD31 and CD133 /CD31 cells derived from the 
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Figure 1 | Microvascular endothelial cells isolated from glioblastoma 
harbour tumour-specific chromosomal aberrations. a, CbD31'/CD144* 
cells were isolated from surgical glioblastoma specimens (n = 15). FITC, 
fluorescein isothiocyanate; PE, phycoerythrin. b, Sorted cells were analysed by 
interphase FISH assay for tumour-specific chromosomal changes, such as 
monosomy of Cep10 (left, arrows) or polisomy of Tell9 and LSI22 (right). 


same tumours. Whereas cells enriched in GSCs generated micro- 
vascular cultures of CD31* and Tie2* cells, neither differentiated cells 
nor the U87MG cell line were able to produce endothelial-like cells 
(Fig. 2a). Such GSC-derived endothelial cells showed considerable 
tube-forming ability, together with low-density lipoprotein (LDL) 
uptake and endothelial nitric oxide synthase (eNOS) expression, which 
were completely absent in differentiated tumour cells and in the 
U87MG cell line (Fig. 2b-d and Supplementary Fig. 4). Unsupervised 
gene-expression analysis of glioblastoma and endothelial cells showed 
that neural-differentiated glioblastoma cells and normal endothelial 
cells constitute the two more distant groups in a dendrogram in which 
tumour endothelial cells cluster between normal endothelial cells and 
glioblastoma neurospheres (Supplementary Fig. 5). 

To investigate the ability of GSCs to form endothelial vessels in vivo, 
we measured the relative amount of murine versus human endothelial 
cells within glioblastoma neurosphere xenografts (Fig. 3a). Flow cyto- 
metry analysis with human- and mouse-specific antibodies showed 
that about 70% of the CD31* cells from the inner portion of the 
tumour were of human origin, whereas nearly all the CD31" cells in 
the tumour capsule were murine (Fig. 3b). Likewise, human CcD144* 
cells were detected only in the core and not in the tumour capsule 
(Fig. 3b). Immunohistochemistry of subcutaneous and intracranial 
xenografts showed that glioblastoma neurosphere-derived tumours 
contained human vessels labelled by human-specific anti-CD31, 
whereas xenografts generated with U87MG or other glioma cell lines 
grown in serum did not (Fig. 3c, Supplementary Fig. 6a and data not 
shown). The presence of human-derived endothelial cells was con- 
firmed by labelling sections of tumour xenografts obtained with 
GFP* glioblastoma neurospheres with anti-GFP and anti-human 
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c, The phenotype of the CD31*/CD144" sorted cells was further analysed by 
anti-GFAP and anti-vWF immunofluorescence. A fraction of the CD31*/ 
CD144* cells coexpressed GFAP and vWF (arrows), indicating an aberrant 
endothelial/glial phenotype. A minority of sorted cells were GFAP" /vWE_. 
d, Quantification of results from FISH and immunofluorescence analysis. 


CD31 antibodies (Supplementary Fig. 6b). Moreover, immunofluor- 
escence staining with validated human-specific endothelial antibodies 
showed that these cells expressed consistently CD34, CD144 and 
VEGER2 (Fig. 3d) but not the stem-cell markers SSEA-1 and CD133 
(Supplementary Fig. 6c). Such human-specific endothelial antigens 
identified microvascular structures containing circulating erythrocytes 
(Fig. 3d and Supplementary Fig. 6a), indicating the functional relevance 
of human angiogenesis in the tumour xenografts. Of note, a similar 
formation of human endothelial cells was observed in subcutaneous 
xenografts obtained with the injection of freshly purified CD1337/ 
CD31 cells, whereas CD133°/CD31 cell xenografts contained only 
mouse endothelial vessels (Supplementary Fig. 7). 

To trace in vivo angiogenesis, we injected RFP-labelled glioblastoma 
neurospheres into transgenic NOD/SCID mice expressing GFP under 
the Tie2 promoter. Examination of a thick-section plane by confocal 
microscopy showed that GFP mouse vessels were primarily outside 
the tumours (Supplementary Fig. 8a). To exclude the occurrence of 
fusion between tumour and mouse endothelial cells, we stained 
tumour xenograft sections with anti-human/mouse Tie2 and CD31 
antibodies. Although CD31 staining showed the presence of vessels 
containing both mouse and human CD31" cells at the periphery of the 
tumour, the majority of endothelial cells inside the tumour mass did 
not express mouse Tie2 and were of human origin in the absence of 
fusion (Supplementary Fig. 8b, c). Moreover, FISH analysis of nuclei 
extracted from microdissected vascular structures of GSC xenografts 
confirmed the absence of murine chromosomes in human cells 
(Supplementary Fig. 9). Together, these findings demonstrate that 
the tumour xenografts obtained by injection of human glioblastoma 
neurospheres develop an intrinsic vascular network composed. by 
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Figure 2 | GSCs cultured under endothelial differentiation conditions 
develop morphological, phenotypical and functional features of endothelial 
cells. a, Flow cytometry analysis of human umbilical vein endothelial cells 
(HUVEC), glioblastoma neurospheres (GNS), primary glioblastoma cells 
cultured in serum (GDC), U87MG, CD31 /CD133* and CD31 /CD133~ 
cells from freshly dissociated glioblastomas. Cells were cultured under standard 
(black) or endothelial (grey) condition. Error bars represent the mean = s.d. 
(n = 4). **P < 0.001. b, Tube formation (top) and LDL-uptake (bottom) assay 
on cells under endothelial conditions as above (GNS and GDC), endothelial 


tumour cells with an aberrant endothelial phenotype. To determine 
whether the GSC-derived endothelial cells contribute to tumour 
growth, we transduced glioblastoma neurospheres with a lentiviral 
vector containing the herpes simplex virus thymidine kinase gene 
(tk) under the control of the transcription-regulatory elements of 
Tie2 (Tie2-tk; Supplementary Fig. 10a), so that the tumour-derived 
endothelial cells would be sensitive to ganciclovir'*”*. For this experi- 
ment, we selected glioblastoma neurospheres with no detectable 
expression of Tie2 (Supplementary Fig. 11). Control cells included 
glioblastoma neurospheres tranduced with an empty viral vector and 
U87MG cells transduced either with Tie2-tk or with a vector conferring 
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cells isolated from glioblastoma patients (GBM patients) and human dermal 
microvascular endothelial cells (HMVEC). Dil-ac-LDL, 1,1'-dioetadeeyl- 
3,3,3',3’-tetramethylindocarboeyanine-perchlorate-acetylated LDL. Scale bars, 
200 pm (top) and 50 tum (bottom). c, Immunofluorescence for eNOS in 
HMVEC, GNS and GDCs treated as above. Scale bar, 100 um. d, In vitro 
perfusion assay on three-dimensional glioblastoma neurosphere-derived 
endothelial culture injected with fluorescein. Scale bar, 50 um. One 
representative of four independent experiments performed in blind is shown 
for b, c and d. 


constitutive expression of Tk (PGK-tk, Supplementary Fig. 10a). One 
week after ganciclovir administration, TdT-mediated dUTP nick end 
labelling (TUNEL) and double immunofluorescence labelling with 
anti-Tie2 antibodies in tumour subcutaneous xenografts showed 
selective apoptosis of the endothelial compartment only in animals 
injected with Tie2-tk neurospheres, whereas PGK-tk tumours con- 
tained a considerable number of apoptotic nuclei both in tumour 
and endothelial cells (Fig. 4a). Moreover, tumours generated by 
Tie2-tk neurospheres underwent a significant size reduction four 
weeks after ganciclovir administration, whereas control GSC xeno- 
grafts increased their size over the same time interval (Fig. 4b). 
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Figure 3 | Human origin of endothelial cells in glioblastoma neurosphere 
xenografts. a, Explanted subcutaneous xenograft obtained by injection of 
glioblastoma neurospheres. Detail of murine vessels on the surface of the 
xenograft (left, black arrowheads) and tumour after capsule removal (right). 
b, FACS evaluation of murine CD31'/CD45- (mCD31), human CD31 
(hCD31) and human CD 144 (hCD144) in the capsule and core of the tumour 
(mean = s.d., n = 4, *P < 0.05). c, Immunohistochemistry of glioblastoma 
neurosphere (GSC1) and U87MG xenografts using either an anti-human CD31 
or anti-human and murine CD31 (one out of four different glioblastoma 
neurosphere samples and serum-grown cell lines are shown). 

d, Immunofluorescence of tumour xenograft sections labelled with anti-human 
CD34 (left), anti-human CD144 (middle) or anti-human VEGFR2 (right). 
Arrows indicate circulating erythrocytes. Data represent one of four 
independent experiments obtained with different glioblastoma neurosphere 
samples. 


Histological examination revealed massive degeneration in the tumour 
xenografts developed by injection of Tie2-tk neurospheres. Four weeks 
after ganciclovir treatment, these tumours were completely devoid of 
vascular glomeruli, tiny capillaries with ongoing phenomena of 
endothelial disruption being the only residual vascular structures 
(Supplementary Fig. 10b). Although all PGK-tk tumours degenerated 
massively, U87MG Tie2-tk xenografts were not affected by ganciclovir 
treatment (Supplementary Fig. 10c, d), confirming that this cell line 
was unable to generate endothelial cells. These findings indicate that 
GSC-derived angiogenesis is essential for tumour survival. Moreover, 
mouse models based on adherent cell lines grown in serum do not seem 
suitable for the study of glioblastoma angiogenesis. 

Here we demonstrated that GSCs are able to differentiate in func- 
tional endothelial cells. Such angiogenic potential could be inherited 
from normal neural stem cells, which have been shown to differentiate 
in endothelial cells both in vitro and in vivo*. The formation of fluid- 
conducting networks by nonendothelial cells has been described for 
melanomas, sarcomas, breast, ovary, lung and prostate carcinomas'”"* 
as a result of vasculogenic mimicry, which is a feature associated with a 
pluripotent gene expression pattern in aggressive tumour cells”. 
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Figure 4 | Selective targeting of glioblastoma neurosphere-derived 
endothelial cells impairs the growth of subcutaneous tumour xenografts. 
a, Double immunofluorescence using anti- TUNEL and anti-Tie2 in xenografts 
from Tie2-tk, PGK-tk and vector glioblastoma neurosphere cells one week after 
ganciclovir administration. Arrows indicate apoptotic Tie2~ (white) and Tie2~ 
(yellow) cells. b, Tumour size measured four weeks after ganciclovir 
administration in xenograft obtained from three different glioblastoma 
neurosphere samples either untransduced (wild type (WT)) or transduced with 
vector, Tie2-tk or PGK-tk. Error bars are mean = s.d. of three different 
experiments. *P < 0.005, **P < 0.001. 


The ability of cancer stem-like cells to directly contribute to the 
tumour vasculature by endothelial cell differentiation represents a 
new mechanism of angiogenesis that might not be restricted to glio- 
blastoma. A similar endothelial potential may be shared by CD44° 
cells purified from ovarian cancer*’. However, the existence of tumour- 
derived endothelial cells in ovarian cancer has not been demonstrated 
yet. Endothelial-like cells with cancer-specific genomic alterations 
have been described in other tumour types, such as lymphoma and 
neuroblastoma’””*. Although the angiogenic activity of cancer stem- 
like cells has not been investigated in other tumours, it is likely that the 
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endothelial cells bearing tumour-specific alterations derive from can- 
cer cells endowed with stem-cell plasticity. Likewise, the vasculogenic 
mimicry might represent an incomplete differentiation of cancer stem- 
like cells towards the endothelial lineage, as indicated by the aberrant 
mixed phenotype of glioblastoma xenografts generated by the subset of 
CD31" /CD144" cells that retain tumorigenic activity. 

Our findings may have considerable therapeutic implications. On 
the one hand, endothelial cells bearing the same genomic alteration as 
cancer cells may show a different sensitivity to conventional anti- 
angiogenic treatments, such as VEGF/VEGEFR targeting. On the other 
hand, our data indicate the possibility of targeting the process of GSC 
differentiation into endothelial cells, thus offering new therapeutic 
options for cancer treatment. 


METHODS SUMMARY 

Cell culture. Glioblastoma neurosphere cultures were established from freshly 
dissociated surgical specimens as described**!**. Primary cultures of glioblastoma 
differentiated cells were obtained by plating cells from freshly dissociated samples 
in DMEM-F12 medium containing 10% FBS. For primary culture of glioblastoma 
microvascular endothelial cells, CD31* cells were purified using Miltenyi 
Microbead Kit (Miltenyi Biotech) according to manufacturer’s instructions and 
grown in endothelial basal medium (EBM Bullet kit; Biowhittaker Cambrex). 
Immunohistochemistry, immunofluorescence and flow cytometry. Immuno- 
histochemistry was performed as described’ on deparaffinized sections of 
glioblastoma tissue. For immunofluorescence, cells were fixed with 4% para- 
formaldehyde and permeabilized in 0.1% Triton X-100. Cytofluorimetric analysis 
was performed using a FACS Canto flow cytometer (Becton Dickinson). Cell 
sorting was performed with a FACS Aria cell sorter (Becton Dickinson). 
Interphase FISH and combined immunohistochemistry and FISH (FICTION). 
Single- and dual-probe interphase FISH was performed as described’. Images were 
captured using a high-resolution black and white CCD microscope camera 
AxioCam MRm REV 2 (Karl Zeiss) and analysed using Axio Vision 4 multichannel 
fluorescence basic workstation (Karl Zeiss). 

Lentiviral infection. Selective targeting of the cells expressing endothelial pheno- 
type was obtained by modifying the pRRLsin.Tie2p.TKiresGFP.spre lentiviral 
vector provided by L. Naldini’*’*. Viral particle production and GSC infection 
were performed as previously described”’. 

In vivo experiments. Nude athymic and SCID mice (female, 4-5 weeks of age; 
Charles River) were used. Partially dissociated glioblastoma neurospheres were 
used for both orthotopic and subcutaneous injection, typically 10° and 5 X 10°, 
respectively. For in vivo endothelial targeting, mice were injected with Tie2-tk 
glioblastoma neurospheres into the right flank and control vector glioblastoma 
neurospheres into the left flank. After having developed bilateral nodules mice 
received ganciclovir at 50 mgkg | day ' intraperitoneally for 5 days. Ganciclovir- 
treated mice were killed at different time points to collect samples for histology and 
immunofluorescence. 

Statistical analysis. Student’s t-test was used to analyse data using Statistica 
(version 5.5; Statsoft) or Fig.P (version 2.7; Biosoft) softwares. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Glioblastoma neurosphere isolation and characterization. Glioblastoma tissue 
specimens were obtained from adult patients undergoing craniotomy at the 
Institute of Neurosurgery, Catholic University School of Medicine in Rome. 
Informed consent was obtained before surgery according to the protocols 
approved at the Catholic University. Cells were purified through mechanical 
dissociation of the tumour tissue and cultured at clonal density in a serum-free 
medium supplemented with EGF and basic FGF as described**'”’. Isolated cells 
were expanded and characterized both in vitro and in vivo. In these conditions, 
cells were able to grow in vitro in clusters called neurospheres and maintain an 
undifferentiated state, as indicated by morphology and expression of stem-cell 
markers such as CD133, SOX2, musashi and nestin. Such glioma neurosphere cells 
showed a clonal frequency higher than 10%, ability to coexpress astrocytic as well 
as neuronal phenotypic markers after serum-induced differentiation in vitro, and 
generation of glial tumours in immunodeficient mice. 

Flow cytometry, immunohistochemistry and immunofluorescence. Cell sus- 
pension obtained by mechanical dissociation of the tumour tissue from glioblastoma 
patients from the Institute of Neurosurgery (Supplementary Table 2) was passed 
through a 100-1m mesh to remove aggregates and stained with fluorochrome- 
conjugated antibodies to surface antigens. After 1h of incubation on ice, cells 
were washed twice with PBS and finally resuspended in PBS or in PBS containing 
7-aminoactinomycin D (7-AAD) 5yg ml”! to assess viability. Analysis was per- 
formed using a fluorescence-activated cell sorter (FACS) Canto flow cytometer 
(Becton Dickinson). Cell sorting was performed with a FACS Aria cell sorter 
(Becton Dickinson) equipped with an automatic cloning deposition unit. Cells were 
selected on the basis of physical parameters and fluorescence and were sorted on 
sterile tubes or on slides depending on their further utilization. 

For immunoistochemistry, immunofluorescence and flow cytometry the fol- 
lowing antibodies were used: mouse anti-human CD31 (Novocastra); mouse anti- 
CD31 (Dako); rat anti-mouse CD31 (BD, Pharmingen); rabbit anti-GFAP 
(Chemicon or Dako); mouse anti-vWF (Dako); goat anti-human Tie2 (R&D 
Systems); mouse anti-human VEGFR2 (R&D Systems); rabbit anti-Tie2 (Santa 
Cruz Biotechnology); mouse anti-human CD144 (R&D Systems); mouse anti- 
SSEA-1 (R&D Systems); mouse anti-human nuclei antigen (Chemicon); rabbit 
anti-GFP (Molecular Probes) and anti-eNOS (BD, Pharmingen). Validation of 
antibody specificity for human and mouse endothelial antigens is shown in 
Supplementary Fig. 12. 

Interphase FISH and FICTION on glioblastoma sections. Single- and dual- 
probe interphase FISH was performed on histological sections of glioblastoma, on 
cell nuclei extracted from paraffin-embedded sections of glioblastoma, on cells 
sorted from glioblastoma samples, and on cultured microvascular endothelial cells 
of glioblastoma as described’. Aneuploidy was definened as loss or gain of one or 
more chromosome FISH signals. Briefly, locus-specific probes for Cep10, Tell19q 
and LSI22 were used (Vysis). Standard FISH protocols for pretreatment, hybridi- 
zation and analyses were followed according to the manufacturer’s instructions. 
Histological 4-j1m-thick paraffin sections were dewaxed with xylene and digested 
with proteinase K lpg ml! in 0.002 M Tris buffered saline (TBS) for 20 min at 
room temperature (20°C). Samples were then dehydrated in a graded ethanol 
series and subjected to FISH analysis. After specimen/probe denaturation at 73°C 
for 5 min, the probes (10 tl per slide) were applied to the slides and subsequently 
incubated overnight at 42 °C for Cep10 and at 37 °C for 10-16 h for LSI22/Tel19q. 
Post-hybridization procedure included subsequent washing in 50% formamide/ 
2X SSC (30 min at 46 °C) and 2X SSC 0.1% NP40 (5 min at room temperature). 
Nuclei were counterstained with 4',6-diamidino-2-phenylindole (DAPI; Vector 
Laboratories). The slides were studied with an Axioplan fluorescence microscope 
(Karl Zeiss) that was equipped with the appropriate filter sets (Vysis). Images were 
captured using a high-resolution black and white CCD microscope camera 
AxioCam MRm REV 2 (Karl Zeiss). The resulting images were reconstructed with 
green (FITC), orange and blue (DAPI) pseudocolour using AxioVision 4 multi- 
channel fluorescence basic workstation (Karl Zeiss) according to the manufac- 
turer’s instruction. Glioblastoma sorted cells were fixed in a solution of methanol 
and acetic acid (3:1) for 10 min and then processed for FISH as described. 

Laser capture microdissection of vessels from GSC-derived xenografts. We 
isolated the vascular structures of tumour xenografts using the Laser Capture 
Microdissection (LCM) System (PixCell He, Arcturus; distributed by Euroclone). 
LCM was performed on CD31-immunostained (M-20, Santa Cruz Biotechnology) 
paraffin sections (10-,1m thick) of tumour xenografts. For each sample, laser power 
(50-70 mW) and laser duration (1-1.2 ms) were adjusted. The microdissected 
tissue was then transferred to an LCM cap and the cells were incubated in 100 ml 
digestion buffer (0.005% proteinase K in tris(hydroxymethyl)aminomethane 
(TRIS) 0.05M pH 7). Endothelial cell nuclei were isolated using the NE-PER 
Nuclear and Cytoplasmic Extraction Reagents (Thermo Scientific) following 
manufacturer recommendations. Successively, nuclei were washed with PBS and 
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fixed in a solution of methanol/acetic acid (3:1). Eight millilitres of nuclei suspen- 
sion were placed on a positive charged slide and were dried in a 65 °C oven for 
30 min. 

FISH on cell nuclei extracted from tumour xenografts. To distinguish human 
endothelial cells from mice cells, we performed FISH analyses using locus-specific 
probes for Cep10 (Vysis) and a Cy3-conjugate mouse pan-centromeric chro- 
mosome (Cambio). FISH protocols for the Cep10 probe were performed as previ- 
ously described, whereas for the mouse pan-centromeric probe we followed the 
manufacturer’s instructions. Briefly, after enzymatic digestion with 4mgml! 
pepsin in NaCl 0.9% pH 1.5 for 20 min at 37 °C, the nuclei were denatured in 
70% formamide in 2X SSC for 2min at 70 °C, and were subsequently immersed in 
ice-cold 70% ethanol and dehydrated through a series of alcohol washes at 79%, 
90% and 100%. The probe was denatured for 10 min at 85°C and immediately 
chilled on ice. After specimen/probe denaturation, probe was applied to the slide 
and subsequently incubated overnight at 37 °C. After washing, nuclei were then 
counterstained with DAPI (Vectashield mounting medium with DAPI; Vector 
Laboratories). 

Isolation and culture of human glioblastoma microvascular endothelial cells. 
Glioblastoma tissue specimens were stored in medium M199 (Gibco) containing 
penicillin 100Um1 at 4 °C for less than 24 h before processing. After several washes 
with PBS/antibiotics, tissue was finely minced using surgical scissors and then incu- 
bated for 2-3 h at 37 °C in Dulbecco’s medium (Gibco) containing 0.2% bovine 
serum albumin (BSA), liberase blendzyme 2-2.5 mg ml | (Roche Diagnostics). 
Cellular macroaggregates still present after enzymatic digestion were removed by 
filtration through a 10-\um pore-size filter (Dako), thus obtaining a monocellular 
suspension. The filtrate was then washed twice with PBS and centrifuged, the pellet 
resuspended in 1 ml cold PBS/0.1% BSA pH 7.4. Selection of endothelial cells was 
performed by using CD31 Miltenyi Microbead Kit (Miltenyi Biotech) according to 
manufacturer’s instructions directly on cell suspensions after enzymatic digestion. 
Purified cell clusters as well as the negative counterparts were separately resuspended 
in endothelial basal growth medium (EBM Bullet kit; Biowhittaker Cambrex). Cells 
were plated onto 25-cm” culture dishes, previously coated with 1 jg cm’ * collagen 
type land 1 gcm ~” fibronectin (Sigma), and maintained at 37 °C in an atmosphere 
of 5% CO. After 10-12 h, plated cells were washed three times with cold PBS to 
favour detachment of nonendothelial cells. The medium was changed every 3 days. 
Once at confluence, cells were detached by trypsinization with 0.25% Trypsin/EDTA 
(Gibco) and reseeded on collagen/fibronectin-coated culture dishes at a split ratio of 
1:3.A second magnetic selection was performed on plated endothelial cells after 7-10 
cell divisions in order to increase the purity of the cultures. 

Endothelial function assays. For in vitro three-dimensional tube formation assay, 
twelve microlitres of tail collagen were dropped onto glass coverslips and allowed 
to polymerize for 1 h at 37 °C. Cells were then seeded on top of the gels at 50,000 
cells per well and allowed to incubate. Then endothelial basal medium was added 
and cells were cultured for 7 days.To quantify the tube formation, image-analysis 
techniques were used that measure the length of the tubes and the number of the 
connections. Data were photographically recorded daily. The average total length 
and mean total number of junctions for different endothelial cords were further 
analysed using the two-sided Mann-Whitney U test.For microinjections, a Zeiss 
microscopy with a manipulator was used. Fluorescein (Monico) was prediluted 
1:1,000 into medium and injected into three-dimensional culture with a Hamilton, 
and observed with a Zeiss Axiovision device camera. 

To determine the uptake of acetylated LDLs, cells were incubated with 10 mgml ! 
Dil-labelled (1,1'-dioetadeey]-3,3,3’,3’-tetramethylindocarboeyanine perchlorate) 
acetylated LDL; Molecular Probes) at 37 °C for 4h. The slides were analysed using a 
Nikon Eclipse TE300 inverted microscope equipped with a Zeiss Axiovision device 
camera. 

Gene array. Total RNA was extracted from glioblastoma neurospheres, serum- 
differentiated glioblastoma neurospheres, glioblastoma neurospheres cultivated 
under endothelial condition and endothelial cells isolated from glioblastoma 
patients. Normal human umbilical vascular (HUVEC) or microvascular 
(HMVEC) endothelial cells were used as controls for endothelial gene expression 
patterns. RNA was labelled and hybridized to Affymetrix GeneChip1.0ST arrays 
following the manufacturer’s instructions. Hybridization values were normalized 
by the robust multiarray averaging (RMA) method and hierarchical clustering, 
with average linkage method, was performed according to samples’ gene expression 
profile. Full data were submitted to ArrayExpress under the accession number 
E-MEXP-2891. 

Lentiviral infection. Selective targeting of the cells expressing endothelial pheno- 
type was obtained by modifying the pRRLsin.Tie2p.TKiresGFP.spre lentiviral 
vector provided by L. Naldini'*’*. Viral particle production and GSC infection 
were performed as previously described”. 

In vivo experiments. Studies involving animals were approved by the Ethical 
Committee of the Catholic University School of Medicine in Rome. Nude athymic 
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and SCID mice (female, 4—5 weeks of age; Charles River) were used. For subcutaneous _ For intracranial xenografts, 2 10° cells in 5 ul of PBS were injected stereotactically 
xenografts, cells were resuspended 1X 10° in 0.1 ml of cold PBS, mixed with an equal _ onto the striatum. Mice were killed by 16-20 weeks after grafting to collect tumour 
volume of cold Matrigel (BD Bioscience), and injected into the flanks of nude mice. _ xenografts. On ganciclovir treatment, no major toxicity was observed in vital organs. 
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Glioblastoma stem-like cells give rise to tumour 


endothelium 


Rong Wang’, Kalyani Chadalavada‘, Jennifer Wilshire’, Urszula Kowalik', Koos E. Hovinga!®, Adam Geber', Boris Fligelman', 


Margaret Leversha*, Cameron Brennan'*” & Viviane Tabar!? 


Glioblastoma (GBM) is among the most aggressive of human cancers’. 
A key feature of GBMs is the extensive network of abnormal vascu- 
lature characterized by glomeruloid structures and endothelial hyper- 
plasia’. Yet the mechanisms of angiogenesis and the origin of tumour 
endothelial cells remain poorly defined* >. Here we demonstrate that 
a subpopulation of endothelial cells within glioblastomas harbour 
the same somatic mutations identified within tumour cells, such as 
amplification of EGFR and chromosome 7. We additionally demon- 
strate that the stem-cell-like CD133* fraction includes a subset of 
vascular endothelial-cadherin (CD144)-expressing cells that show 
characteristics of endothelial progenitors capable of maturation into 
endothelial cells. Extensive in vitro and in vivo lineage analyses, 
including single cell clonal studies, further show that a subpopula- 
tion of the CD133* stem-like cell fraction is multipotent and capable 
of differentiation along tumour and endothelial lineages, possibly 
via an intermediate CD133*/CD144* progenitor cell. The findings 
are supported by genetic studies of specific exons selected from 
The Cancer Genome Atlas®, quantitative FISH and comparative 
genomic hybridization data that demonstrate identical genomic 
profiles in the CD133* tumour cells, their endothelial progenitor 
derivatives and mature endothelium. Exposure to the clinical anti- 
angiogenesis agent bevacizumab’ or to a y-secretase inhibitor*® as 
well as knockdown shRNA studies demonstrate that blocking 
VEGF or silencing VEGFR2 inhibits the maturation of tumour 
endothelial progenitors into endothelium but not the differentiation 
of CD133* cells into endothelial progenitors, whereas y-secretase 
inhibition or NOTCH1 silencing blocks the transition into endothe- 
lial progenitors. These data may provide new perspectives on the 
mechanisms of failure of anti-angiogenesis inhibitors currently in 
use. The lineage plasticity and capacity to generate tumour vascula- 
ture of the putative cancer stem cells within glioblastoma are novel 
findings that provide new insight into the biology of gliomas and the 
definition of cancer stemness, as well as the mechanisms of tumour 
neo-angiogenesis. 

Blood vessels within GBM express a variety of markers, including 
CD31 and CD105 (also known as PECAM1 and ENG, respectively); 
CD105 is a proliferation-associated molecule expressed in angio- 
genic endothelium’. Quantitative analysis of 16 GBM specimens by 
fluorescence-activated cell sorting (FACS) and immunohistochem- 
istry showed that more than 70% of CD105* cells co-express CD31 
(Fig. 1a, b), VEGFR2 (also known as KDR) and von Willebrand factor 
(also known as VWF), exhibit endothelial morphology, and labelling 
by Dil-AcLDL (1,1’-dioctadecyl-3,3,3’,3'-tetramethyl-indocarbocyanine 
perchlorate-labelled acetylated low density lipoproteins, ref. 10), suggest- 
ing an endothelial phenotype (Supplementary Fig. 1a). On average, ~5% 
of the total cell population expressed CD31 in normal brain and GBM 
specimens (n= 7), whereas CD105* cells were essentially absent in 
normal brain (Supplementary Fig. 1b). CD105* cells were also isolated 


by FACS from fresh GBM specimens and injected with a collagen 
matrix’ into the flank of NOD-SCID mice. The resulting implants were 
composed of a network of vascular channels of human origin, expressed 
CD105 and CD31 and showed evidence of uptake of systemically injected 
lectin (Fig. 1c). 

Whereas endothelial cells in GBMs are often classified as “hyperplas- 
tic’, the abnormal blood vessel architecture, the distinct gene express- 
ion profiles’* and the selective emergence of abnormal vessels in GBMs 
versus lower grade gliomas” suggest a more complex ontogeny of GBM 
endothelium. We performed quantitative fluorescence in situ hybrid- 
ization (FISH) analyses for EGFR and chromosome 7 (ref. 13) on 
CD105* cells isolated by FACS and on sections of the corresponding 
GBM parent tumour (Fig. 1d, e and Supplementary Fig. 2). The pro- 
portion of CD105* cells harbouring =3 copies of the EGFR amplicon or 
the centromeric portion of chromosome 7 was comparable to the pro- 
portion of tumour cells with the same aberrations (Supplementary Table 
la). We also performed quantitative PCR (qPCR) for three segments of 
the EGFR amplicon (exons 4, 9 and 11), known to be mutated at high 
frequency according to data from The Cancer Genome Atlas°. The data 
demonstrate a similar copy number in the CD105° cells and the cor- 
responding parent tumour (Supplementary Table 1b) and indicate that a 
proportion of tumour endothelial cells within GBM is in fact neoplastic. 

CD133 is a cell surface glycoprotein used extensively as a marker of 
putative cancer stem cells (CSCs) but also expressed in haematopoietic 
stem cells'*"*. Although the specific identity and definition of CSCs 
remains a matter of debate, we proposed that the CD133” fraction 
may be related to the endothelial differentiation potential observed. 
Acutely dissociated cells from a series of 14 GBMs were fractionated 
into four groups: (1) CD144*/CD133_, (2) CD144*/CD133* (double 
positive, DP), (3) CD133*/CD144~ and (4) CD133"/CD144” (double 
negative, DN) (Fig. 2a). All samples contained the four fractions, with 
the DN being the largest population (Supplementary Table 4). 
Quantitative PCR with reverse transcription (qRT-PCR) analysis for 
endothelial markers (Supplementary Fig. 3a) demonstrated marked 
enrichment of VEGFR2 and the endothelial progenitor marker CD34 
in the CD144*/CD133~ and in the DP populations. CD105 was con- 
sistently absent in the CD133* and CD144" fractions. To define lineage 
potential further, DP cells were cultured for 5 days in endothelial cell 
medium which resulted in the downregulation of CD 144, the upregula- 
tion of CD105 and CD31 as well co-expression of VEGFR2 and CD34 
and labelling with Dil-AcLDL (Fig. 2b and Supplementary Fig. 3b). 
When grown in three-dimensional (3D) gel cultures, the in vitro DP- 
derived endothelial cells form vascular networks reminiscent of normal 
endothelium, but also thickened channel walls and areas of confluence 
more suggestive of abnormal tumour vessels (Fig. 2c, d). The primary 
CD105* cells also form glomeruloid-like structures in 3D gel, with high 
lectin uptake (Supplementary Fig. 1c). DP-derived CD105* cells were 
sorted and injected subcutaneously in NOD/SCID mice, giving rise to 
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Figure 1 | CD105* endothelial cells in GBM harbour genomic aberrations. 
a, FACS analysis and quantification of GBM-derived CD105* cells shows co- 
expression of other endothelial cell markers (CD31, VEGFR2) and uptake of 
Dil-AcLDL(n = 3). FITC, fluorescein isothiocyanate; PE, phycoerythrin 

b, CD105 immunostaining in GBMs delineates microvessels co-labelling with 
CD31 and glomeruloid vessels surrounded by caldesmon (CALD)-expressing 
pericytes. c, Functional neovessel formation by GBM-derived CD105” cells in 
the flank of NOD-SCID mice. Confocal immunofluorescence demonstrates 
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Figure 2 | GBM-derived CD133* cells include a fraction of endothelial 
progenitors a, Representative FACS analysis of a GBM specimen with 
fractionation into four cell subpopulations based on the expression of CD133 
and vascular E-cadherin (CD144). b, Immunofluorescence analysis of DP 
(CD133*/CD144"*) cells upon differentiation demonstrates co-expression of 
endothelial markers and Dil-AcLDL uptake. ¢, d, In Matrigel, DP cells will 
exhibit Dil-AcLDL uptake and form tubular networks comparable to those 
shown by normal endothelial cells, as well as areas of thickened walls where cells 
are more proliferative. Scale bars, 100 um in b and d; 300 jm in c. 
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co-localization of a human mitochondria marker with CD31 and uptake of 
lectin by the CD105* vessels in the implants. d, Immuno-FISH of CD105° 
vessels in GBM specimens (case 76, 78) shows multiple copies of the EGFR 
amplicon (arrows). e, FISH on CD105* cells sorted from GBMs confirms 
amplification of EGFR (red) and chromosome 7 centromere (Chr7, green) 
(arrows). Control nuclei, individually contoured, are from normal human 
fibroblasts. Scale bars, 50 um. Error bars, s.d. 


vascularized plugs identical to those obtained from primary CD105* 
cells (Supplementary Fig. 1d). 

The CD144*/CD133° cell fraction was often very small but showed 
a restricted differentiation and immunohistochemical profile (Sup- 
plementary Fig. 3c, d). When grown in Matrigel, the CD144°/ 
CD133 cells develop tubular, capillary-like structures” and no glomeruli 
(Supplementary Fig. 3e). CD144*/CD133° cells do not express neural 
markers or form neurospheres, thus indicating a more restricted endothe- 
lial precursor cell identity (Supplementary Fig. 3f). Unsupervised cluster- 
ing of transcriptome data was performed on several data sets including 
independent samples of the four sorted tumour subpopulations, as well as 
CD144* human embryonic stem-cell-derived endothelial precursors and 
bone-marrow-derived CD34* endothelial progenitors (Supplementary 
Fig. 3g). Taken together, these results indicate that GBMs comprise cell 
fractions capable of endothelial cell differentiation. 

The identification of genomic aberrations in tumour endothelium 
and the presence of endothelial progenitors within the CD133” putative 
CSC fraction in GBMs, led us to postulate that DP cells may represent 
the neoplastic origin of tumour endothelium and could derive from the 
CD133* CSC fraction. CD133*/CD144~ cells were then labelled with 
EF«-1::GFP (elongation factor «1-green fluorescent protein) lentiviral 
vectors, triple sorted, and GFP*/CD133*/CD144 cells were co- 
cultured in the presence of tumour cells. On day 5, FACS analysis 
demonstrated the emergence of a GEP*-DP population (Fig. 3a, b). 
When placed in collagen cultures, the GFP*-DP cells had intracellular 
vacuoles suggestive of early lumen formation by endothelial tubes” 
(Fig. 3c), and differentiation into cells that express CD105 and CD31 
and exhibit Dil-AcLDL uptake (Fig. 3d and Supplementary Fig. 4a). 
Importantly, co-culture with tumour cells is essential for the emergence 
of DP cells (Fig. 3a, b). Our data confirm that the DP endothelial 
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Figure 3 | CD133*/CD144° cells are multipotential and give rise to 
endothelial cells via an endothelial progenitor intermediate. a, b, Co- 
cultures of CD133*/CD144~ cells with tumour cells give rise to endothelial 
progenitors that co-express CD133 and CD144 (DP) as shown and quantified 
by FACS analysis (n = 3). APC, allophycocyanin. c, GFP” -derived DP cells 
form intracellular vacuolar structures in collagen gel, characteristic of 
endothelial cells. d, Immunohistochemistry of CD133*/CD144 -derived 
endothelial cells (n = 3). e, f, Single cell clonal analysis of GFP-labelled 
CD133*/CD144° cells. GFP* clones derived from single cells are seeded under 
neural or endothelial conditions. Normal endothelial precursor cultures (EPC) 
and human dermal fibroblasts (HDF) were used as controls. Under endothelial 
conditions, all cells except HDF express endothelial but not neural markers. 
Under neural conditions, cells from the same GFP/CD133* clone are positive 
for GFAP and nestin but not endothelial markers, while controls are negative 
for all markers. Scale bar, 50 um. Errors are s.d. 
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progenitors within GBM can arise from the CD133* cell population 
and are capable of differentiating into endothelial cells of tumour origin. 
Of note, the tumour cells used in these co-culture experiments originate 
from tumours with different genetic backgrounds and transcriptomal 
subclasses (Supplementary Tables 2 and 3). 

Recent data support a close interaction”' or a lineage relationship” 
between endothelial cells and neural stem cells. We next explored 
whether endothelial differentiation of CD133*/CD144~ can be further 
promoted by extrinsic signals. To this end, CD133*/CD144° cells were 
isolated from GBM samples, stably transduced with EFx-1::GFP lenti- 
viral vectors, sorted for GFP‘ /CD133*/CD144_ and co-cultured with 
tumour-derived endothelial cells. GFP-expressing endothelial cells 
were identified at 7-10 days in vitro as demonstrated by co-labelling 
of GFP with CD105 and CD31, and also incorporation of Dil-AcLDL. 
Control experiments using GFP-labelled CD133° cells did not yield 
any endothelial cells (Supplementary Fig. 4b). The CD133*/CD144_ 
population formed neurospheres and readily differentiated along the 
three main CNS lineages (Supplementary Fig. 4c). Whereas these data 
are suggestive of the multipotent nature of the CD133"/CD144 cells, 
they do not rule out the presence of heterogeneous populations within 
the CD133"/CD144 fraction with distinct differentiation potentials. 
We thus performed single-cell clonal studies of CD133"/CD144 cells 
as well as normal endothelial cells and fibroblasts as controls (Sup- 
plementary Fig. 4d). The data demonstrate both endothelial and neural 
differentiation potential within a single-cell derived clone confirming 
that CD133°/CD144~ cells are capable of generating tumour cells 
and tumour-derived endothelium (Fig. 3e, f). FISH for EGFR and 
chromosome 7 in the clones confirmed the presence of genomic ampli- 
fications identical to those exhibited by the parent tumour tissue 
(Supplementary Fig. 4e). 
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We next tested the fate of the various tumour cell fractions upon 
transplantation in vivo. CD133'/CD144_, DP, CD144*/CD133 and 
DN cells were injected into the striatum of immunodeficient mice. All 
grafted animals developed tumours with the exception of those that 
received cells from the DN and CD144"/CD133 fraction. DP and 
CD133*/CD144~ gave rise to large, highly infiltrative and hyperproli- 
ferative tumours showing strong expression of nestin (Fig. 4a). Whereas 
all xenograft tumours had a comparable volume and proliferation rate, 
the DP-derived tumours showed significantly increased levels of vascu- 
larization as demonstrated quantitatively (Supplementary Fig. 5a). 

Some of the animals were grafted with stably GFP-marked CD133* 
cells allowing us to serially passage GFP-labelled CD133*/CD144_ 
cells from the primary xenograft in NOD-SCID mice. Secondary 
tumours formed at similar efficiency and showed comparable cell 
composition to the first passage cells. FACS analysis of GFP labelled 
xenograft cells demonstrates expression of endothelial markers, 
including CD105 and CD34 (Fig. 4b). After a second passage in vivo, 
tumours were sorted again for GFP*/CD133*/CD144° cells, which 
upon culture gave rise to GFP-labelled CD31* and CD105* cells, thus 
demonstrating maintenance of the multipotential phenotype (Fig. 4c). 
Immunohistochemical analysis, including confocal microscopy, 
demonstrated tumour blood vessels with typical morphology that 
express human markers. Tumour-bearing animals were also injected 
systemically with lectin, resulting in vessel-specific uptake and co- 
labelling with human markers (Fig. 4d, e and Supplementary Fig. 6). 
Thus, multipotency—including differentiation capacity along 
endothelial lineages—is maintained within the CD133*/CD144~ 
population in vivo and upon passaging. However, in the absence of 
clonal studies in vivo, true multipotency of tumour stem-like cells 
cannot be definitively confirmed. 

A more comprehensive and quantitative analysis of genomic aber- 
rations was conducted in order to verify the lineage relationship 
among the different tumour subpopulations. qPCR for the EGFR 
exons as described above® demonstrates the highest copy number 
within the CD133*/CD144~ population followed by the endothelial 
progenitors (DP) and the CD105* cells (Supplementary Fig. 5b). 
Interestingly, the CD31” cells and the CD144"/CD133 progenitors 
showed lower levels of amplification, indicating that they may include 
a significant proportion of genotypically normal cells. We propose that 
these cells largely represent normal endothelium and circulating 
endothelial progenitors, respectively. This is compatible with the more 
restricted endothelial fate demonstrated by the CD144*/CD133 cells 
as shown above (Supplementary Fig. 3c, e, f). Quantitative FISH studies 
for copy number of EGFR and chromosome 7 per cell were performed 
on CD133*/CD144_, DP and CD105" cells and revealed a substantial 
proportion of cells bearing the neoplastic aberrations in each popu- 
lation, ranging from 47.3% to 71.7% (Supplementary Fig. 5c). To 
address genomic alterations in tumour cells in a more unbiased man- 
ner we performed array comparative genomic hybridization (CGH) on 
the fractionated populations (Supplementary Fig. 7). The CGH data 
showed similar patterns of genomic aberrations in tumour cells as well 
as the endothelium and its progenitors, at variable amplitudes and 
across different regions, thus demonstrating a similar paradigm even 
in tumours that do not exhibit EGFR gain. We performed transcrip- 
tome analyses on a set of 18 tumours used in this study and found a 
random distribution of commonly described genotypes as well as rep- 
resentation of all TCGA-defined transcriptomal classes (Supplemen- 
tary Table 3). Finally, we performed metaphase spreads on purified cell 
fractions of CD133°/CD144 , DP and CD105* following short-term 
culture. The majority of the cells had a highly abnormal but near- 
diploid karyotype, indicating that nuclear fusion is a very unlikely 
explanation for the lineage transition from cancer cell to endothelial 
progenitor or mature cell (Supplementary Fig. 5d). Vascular mimicry 
has been described in melanoma” and other tumours; aneuploidy 
was also shown in renal cell cancer endothelium, but not matched to 
parent tumour cells’. 
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Figure 4 | Cancer stem-like cells and endothelial progenitors give rise to 
tumour and endothelial cells in vivo. a, Representative magnetic resonance 
imaging (MRI) images from mice that received injection of DN, CD133*/ 
CD144 or DP cells from primary GBM specimens. T2 sequences demonstrate 
infiltrative tumours except in the DN group. Tumours were hypercellular on 
haematoxylin and eosin (H&E), showed high proliferation rates (Ki67) and 
nestin expression. Immunostaining for human-specific CD31 demonstrates 
the presence of vessels of human origin within the tumours. NA, human 
nuclear antigen. b, FACS plots (left) and quantitative analysis (right) for 


We investigated the impact of DAPT (N-[N-(3,5-difluorophenacetyl)- 
L-alanyl|-S-phenylglycine t-butyl ester), a y-secretase inhibitor that 
effectively inhibits Notch signalling®’, and bevacizumab, a VEGFA- 
binding antibody’ currently in clinical use, on the differentiation of 
CD133*/CD144~ to DP and then to CD105~ cells. Exposure to 
bevacizumab did not have an impact on the ability of CD1337/ 
CD144 cells to differentiate into endothelial progenitors, yet it 
blocked further maturation from DP into CD105~ endothelial cells. 
In contrast, y-secretase inhibition resulted in significant suppression of 
the transition from CD133*/CD144_ to DP, but did not affect matura- 
tion to CD105™ cells. To demonstrate the specific roles of the Notch 
and VEGF pathways, we performed knockdown studies targeting the 
NOTCHI and VEGFR2 receptors. The gene silencing data further 
supported the results of the inhibitor studies (Supplementary Figs 8b 
and 9). Gene expression analysis shows significant upregulation of 
NOTCH1/2 and VEGFR1/2 in the CD133*/CD144 and DP groups, 
respectively (Supplementary Fig. 8). These preliminary studies offer a 
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endothelial marker expression in xenograft tumours (GEP*/CD133*/CD144— 
cells) and controls (DN). (# = 3, s.d.). 7-AAD, 7-aminoactinomycin. FL-1 and 
2, fluorescent channels 1 and 2; mlgG, mouse immunoglobulin G. c, Xenograft 
derived GEP*/CD133*/CD144 cells express endothelial markers upon in 
vitro differentiation (arrows). d, Uptake of systemic lectin in tumour xenografts 
demonstrates blood vessels that co-label with human endothelial markers 
(CD31 and CD105). e, Confocal microscopy of xenograft microvasculature. 
Scale bars, 100 [tm in a; 50 im in c; 140 um in d; 10 pm ine. 


novel perspective of the roles of the VEGF and Notch pathways in 
glioma biology, although the functional consequences of VEGF or 
Notch blockade remain to be determined. 

Despite some promise, bevacizumab therapy is often interrupted by 
GBM progression characterized by a decrease in abnormal vascularity 
and significant invasive tumour behaviour’®. Based on the paradigm 
presented here (Supplementary Fig. 9a), bevacizumab failure could be 
conceivably due to the disruption of the dynamic relationships 
between the tumour fractions. 

In summary, our data demonstrate that a subpopulation of cells 
within GBM can give rise to endothelial cells via a bipotential progenitor 
intermediate, and that the CD133~ cancer stem-cell-like fraction 
includes a population of endothelial progenitors. An in-depth under- 
standing of the lineage relationship between tumour cells and endothe- 
lial progeny should provide new insights into CSC biology and tumour 
self-renewal. Given the strong correlation of tumour grade and neo- 
plastic vasculature in human gliomas, agents that could block endothelial 
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transition of tumour cells may provide a novel therapeutic strategy for 
this currently intractable disease. 


METHODS SUMMARY 


All experiments were conducted on freshly obtained surgical specimens of glio- 
blastoma tumour; a neuropathologist confirmed the diagnosis on frozen section 
before tissue acquisition. Tumours were newly diagnosed or recurrent. A total of 
78 tumours were used in the study. Cell fractions were sorted using standard 
methods at our FACS facility; in vitro experiments were conducted on short- 
passage cultures (maximum of five passages) if needed. A total of 34 xenografts 
were obtained in immunodeficient mice following intrastriatal implantation of cell 
populations as indicated in the Methods. A lentiviral vector expressing GFP under 
a PGK promoter (gift from M. Sadelain) was used for cell labelling and sorting. 
Cytogenetic analyses were conducted using standard methods at the Cytogenetics 
Core facility at Sloan Kettering Cancer Center. Knockdown experiments were 
performed using lentiviral vectors expressing shRNA for NOTCH1 or VEGFR2 
(Santa Cruz). All experiments were carried out in triplicates or greater. Data are 
expressed as mean + s.d. P values were determined following two-tailed student’s 
t-test. A P value of <0.05 was considered significant. Tissues were obtained after 
patients’ written consent under a protocol approved by the institution’s Institutional 
Review Board. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Tissue processing. Surgical specimens were collected from the surgical suite at 
Memorial Sloan Kettering Cancer Center, following diagnostic confirmation by a 
neuropathologist. Tissues were obtained after patients’ written consent under a 
protocol approved by the institution’s Institutional Review Board. Tumours were 
cut mechanically first (Mcllwain Tissue Chopper) then dissociated into single cells 
with Liberase Blendzyme 1 (Roche) as described previously”. 

Single cells were blocked with human FcR (1:20, Miltenyi Biotec) at 4°C for 
20 min before incubation with primary antibodies for 30 min. Cells from xeno- 
grafts were further blocked with 2,4-G2 (1:100; Santa Cruz Biotechnology) before 
incubation with antibodies. Cells were incubated with primary antibodies, washed 
and reincubated with appropriate secondary antibodies and resuspended in FACS 
buffer?’ (containing 1X Ca2*/Mg2*-free HBSS (Invitrogen), 10 mM of HEPES, 
0.156% of glucose and 0.5% of low endotoxin BSA fraction V, all from Sigma 
(Sigma-Aldrich), at a pH of 7.2) with 1 pg ml ! 7-aminoactinomycin D (7-AAD, 
BD Pharmingen) before analysis. Mouse IgG1 or secondary antibody alone served 
as control for unspecific binding. Samples were analysed on a FACS Aria flow 
cytometer with CellQuest software (BD Biosciences) and data were analysed using 
FlowJo software (Tree star). A minimum of 10,000 events were counted and cell 
surface expression was analysed in 7-AAD-negative live cells. Antibodies used 
include: phycoerythrin- or allophycocyanin-conjugated anti-CD133 (1:20, 
Miltenyi Biotec); FITC-conjugated anti-CD144 (1:20, Abcam), anti-CD105 
(1:20, BD Biosciences) and anti-CD31 (1:20, BD Biosciences); mouse anti-human 
CD31 (1: 40; BD Biosciences), mouse anti-human CD105 (1:40; Dako), mouse 
anti-human VEGFR2 (1:40; Abcam), mouse anti-human CD34 (1:20; Abcam), 
mouse anti-human CD 144 (1:20; Abcam); mouse anti-human CD133 antibodies 
(AC141 and AC133 epitopes (1:20 each), Miltenyi). FITC-conjugated lectin and 
tetramethyl rhodamine isothiocyanate (TRITC)-conjugated lectins were pur- 
chased from Vector and Sigma separately. 

DNA and RNA preparation. FACS-sorted cell populations from 21 glioblastoma 
patients were used to extract total RNA using an Absolutely RNA Nanoprep kit 
(Stratagene) or an RNeasy Kit (Qiagen). All RNA samples were pre-treated with 
DNase. Sorted cell populations from eight glioblastoma patients were used to 
isolate genomic DNA using the Picopure DNA extraction kit (Molecular 
Devices), followed by phenol (Invitrogen) extraction. 

In vivo studies. Adult female NOD/SCID or male NOD/SCID gamma (NSG) 
mouse (Jackson Laboratory) were anaesthetized with ketamine/xylazine (Hospira) 
and placed in a stereotaxic frame (Stoelting Company). Freshly sorted cells were 
injected into the right striatum immediately after sorting at the following coordi- 
nates (relative to bregma): AP = +0.5, ML= —2, and DV = —2.7. Animals 
received 10,000 cells each of CD133*/CD144”, CD133"/CD144* or DP. NSG 
mice received 500 cells each of CD133*/CD144 or DP. DN cells were used in 
three separate doses (10,000, 50,000 and 100,000 cells). Animals were killed upon 
exhibiting symptoms. Some animals received FITC-conjugated lectin by retro- 
orbital injection before killing. Total animals grafted n = 40. 

The gel implantation assay was modified from ref. 11. Briefly, GBM- or DP- 
derived CD105* (10° or 2X 10° per ml) were resuspended in Collagen IV 
(Chemicon). GBM- or DP-derived cell-gel suspension (500 ll) was injected sub- 
cutaneously below the xiphoid in four or three mice separately. Some animals 
received TRITC-conjugated lectin by tail vein injection before killing. After trans- 
plantation (21 days) the implants were retrieved, fixed overnight in 4% (v/v) 
buffered formalin at 4°C, embedded in Optimal Cutting Temperature 
Compound (0.C.T. compound, Sakura Finetek) and sectioned on a freezing cryo- 
stat (Leica) for histological examination. Animals were housed and cared for in 
accordance with the National Institutes of Health (NIH) guidelines for animal 
welfare and all animal experiments were performed in accordance with protocols 
approved by our Institutional Animal Care and Use Committee (IACUC). 
Animal imaging. In vivo magnetic resonance imaging was performed on a Bruker 
Biospec 4.7-Tesla 40-cm horizontal bore magnet. The system is equipped with a 
200 mT m“' gradient system. Examinations were conducted using a 72-mm birdcage 
resonator for excitation, and detection was achieved using a 3 cm surface coil. T2- 
weighted spin echo images were acquired consecutively using a rapid-acquisition 
relaxation enhanced sequence (RARE). Animals were anesthetized with 2% isoflur- 
ane in N>/O, mixture. 

Immunofluorescence. Primary antibodies were chicken anti-GFP (1:1,000; 
Chemicon), mouse anti-human CD31 (1:400; Abcam); mouse anti-human CD34 
(1:400; Abcam), mouse anti-human CD 105 (1:400; Dako); mouse anti-human VWF 
(1:100; Dako); mouse anti-human VEGFR2 (1:200; Abcam); mouse anti-human 
Ki67 (1:400; Dako); mouse anti-human NCAM (1:150; Santa cruz Biotechnology), 
mouse anti-human mitochondria (1:200; Chemicon), mouse anti-human nestin 
(1:400; Millipore), mouse anti-human nuclear antigen (1:500; Chemicon), rabbit 
anti-human GFAP (1:1,000; Chemicon), mouse anti-human Tujl (1:500; Covance), 
mouse anti-O4 (1:200; Chemicon), rabbit anti-human caldesmon (1:400; Novus 


Biology). The following secondary antibodies were used: Alexa Fluor 488- 
conjugated goat anti-chicken or mouse or rabbit (1:1,000), Alexa Fluor 555- 
conjugated goat anti-mouse or rabbit (1:1,000), Alexa Fluor 555-conjugated goat 
anti-mouse IgM (1:500), all from Molecular Probes (Invitrogen). 

Cell culture and clonal assays. GFP labelling was obtained by incubation with a 
PGK-GFP lentiviral vector (gift from M. Sadelain). For sphere cultures, freshly 
sorted CD133*/CD144-, DP and CD133~/CD144* cells were cultured under 
clonal conditions (1,000 cells per cm? or 5 cells per il) in low-adherence plates 
(Corning) and maintained in serum free-Neurobasal medium supplemented with 
N2 (Invitrogen), 2mM L-glutamine, 20 ng ml’ recombinant human epidermal 
growth factor, and 10 ng ml! recombinant human fibroblast growth factor 2 (all 
from Invitrogen). Neurospheres were reseeded every 5 days after dissociation with 
Accutase (Innovative Cell Technologies). For neural differentiation, CD133°/ 
CD144 cells were cultured in laminin coated plates (BD Biosciences) using 
NeuroCult NS-A Differentiation Kit (human) (Stem Cell Technologies). 

For endothelial progenitor cells, freshly sorted DP or CD144*/CD133 cells 
were seeded on human fibronectin-coated plates (BD Biosciences) at a density of 
10° ml”! with endo-cult liquid medium Kit (Stem Cell Technologies) for propaga- 
tion. DP, CD144°/CD133 or CD133*/CD144  -derived DP cells were grown to 
75% confluence and switched to M199 medium (Invitrogen) for quantification of 
endothelial differentiation as described previously". GBM-derived CD105* cells 
were grown in M199 medium”? for 2 days before FACS analysis. The functional 
assay for endothelial cells was performed by incubation of cells with 10 pg ml” of 
Dil-labelled acetylated low density lipoproteins (Dil-AcLDL) (Molecular Probes, 
Invitrogen) for 4h. 

For DP induction culture, GFP-labelled CD133°/CD144~ were co-cultured with 
tumour cells at a 20:1 ratio in N2 medium. The CD133*/CD144° -derived DP cells 
were sorted by FACS after 5 days for further characterization. A minimum of 100 
cells were counted in triplicate assays. They were cultured in three-dimensional 
collagen gel””. For differentiation of CD133*/CD144° to endothelial cells, tumour 
endothelial cells and GFP labelled CD133*/CD144° cells or control cells were 
resuspended in endo-cult medium and grown on fibronectin coated plates for 
7 days at a ratio of 100:1. Single cell clonal assays were performed by seeding freshly 
sorted single GFP-labelled cells on multi-well plates. Wells containing single green 
cells were identified and monitored until clone formation is established. Single-cell- 
derived clones were further sub-cloned and propagated twice, dissociated and 
seeded under neural and endothelial differentiation conditions as described above. 
Human umbilical cord-derived CD133* endothelial precursor cells (Biochain) or 
human dermal fibroblasts (Cell Applications) were maintained as per manufacturer 
instructions and used as control in clonal analysis. 

Inhibitor studies. For drug treatment assays, cells were cultured in DP induction 
medium or endothelial differentiation medium containing 5 [1M of the y-secretase 
inhibitor DAPT  (N-[N-(3,5-difluorophenacetyl)-L-alanyl]-S-phenylglycine 
t-butyl ester, Sigma-Aldrich) or 1 pg ul! of bevacizumab (Genentech). Treated 
cells were analysed by FACS analysis after 48 h incubation. VEGF was measured in 
the culture medium with a human VEGF ELISA Kit (Invitrogen) following the 
manufacturer directions. 

Knockdown studies. GBM-derived fresh DP and GFP-CD133sp cells were 
infected with shRNA virus targeting VEGFR2 or NOTCHI or a control virus 
(all from Santa Cruz Biotechnology). NOTCH1 shRNA lentiviral vector mix 
contains three target-specific constructs: CACCAGTTTGAATGGTCAATTCAA 
GAGATTGACCATTCAAACTGGTGTTTTT; CCCATGGTACCAATCATGA 
TTCAAGAGATCATGATTGGTACCATGGGTTTTT; CCATGGTACCAATC 
ATGAATTCAAGAGATTCATGATTGGTACCATGGTTTTT. VEGFR2 shRNA 
lentiviral vector mix contains three target-specific constructs: ACTGTGGTGATT 
CCATGTCTTCAAGAGAGACATGGAATCACCACAGTTTTTT; ACTTGTAA 
ACCGAGACCTATTCAAGAGATAGGTCTCGGTTTACAAGTTTTTT; CACC 
TGTTTGCAAGAACTTTTCAAGAGAAAGTTCTTGCAAACAGGTGTTTTT. 
The infected cells were selected with 2-4,gml-' puromycin (Santa Cruz 
Biotechnology) and used for FACS analysis and/or collected for RT-PCR as 
described above after 5 days in selection. 

In vitro angiogenesis assay. Intracellular vacuole formation was evaluated by 
culturing CD133*/CD144° -derived DP cells in three-dimensional collagen gel 
as described in ref. 20. Tubular network formation was assessed by culture in 
growth factor reduced Matrigel assay Kit (BD Biosciences) following the protocol 
from ref. 19. 

Cytogenetic analyses and genomic PCR. Fluorescence in situ hybridization was 
performed using BAC clone RP11-339F13 and PAC clone RP5-1091E12 spanning 
the EGFR locus in 7p11, both labelled with Red-dUTP, together with a chro- 
mosome 7 centromere repeat DNA probe labelled with Green-dUTP targeted at 
the centromeric region of chromosome 7 (7p11.1-7q11.1 D7Z1 alpha satellite 
region). FISH was performed on sorted cells post cytospin on glass slides. A 
minimum of 100 cells in interphase were analysed. Human dermal fibroblasts 
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(HDF) served as normal control. The false positive rates for FISH probes was 
determined as 1% (s.d. = 1.3) and the cut-off level for the diagnosis of amplifica- 
tion was set at 5% (>3s.d.) (n = 3, total counted 2,000 control cells). 

FISH on tumour sections, as reported in Supplementary Tables 1b and 2, was 
performed independently by the Clinical Cytogenetics Facility at Memorial Sloan 
Kettering Cancer Center, as part of a now routine molecular diagnostic test. The 
probe used is the 7p12 LSI EGFR and the 7p11.1-7q11.1 CEP (D7Z1 alpha 
satellite) dual colour probe, purchased from Abbott Molecular. 

Fluorescence immunophenotyping and interphase cytogenetics, a technique 
combining immunohistochemistry for CD105 and FISH for EGFR, was carried 
out on 10-t1m thick tissue sections. Normal human brain cerebral-cortex sections 
(Biochain) were used as controls. 

In a copy number quantification reaction by real-time PCR, EGFR primers were 
designed based on published data®’*”*”°. Genomic DNA (10 ng) from sorted cells or 
normal human brain was used as template to examine the copy number of exons 4, 9, 
11 in the EGFR gene; GAPDH was used as reference gene. Each replicate was 
normalized to GAPDH to obtain a AC,, and then an average AC, value for each 
sample (from the three replicates) was calculated. All samples were then normalized 
to the calibrator sample (normal human brain) to determine AAC,, Relative quantity 
(RQ) is 244%, and copy number is 2X RQ. The EGFR copy number in each 
population was defined by the average of copy number from three exons. Error bars 
indicate the range of the data from the three exons in each of the three samples. 

Karyotype analysis was performed on metaphase spread of FACS-purified cell 

subpopulations that were in culture for 3 days. The cultures were treated with 
Colcemid (0.1 1g ml’) for 1.5h before in situ metaphase preparation according 
to standard cytogenetics procedures. 
CGH studies. Comparative genomic hybridization (CGH) assay was performed 
by hybridizing genomic DNA from sorted cells with 44K human genome CGH 
arrays, and frozen banked whole tumour on 244K and 1M human genome CGH 
arrays (all commercial arrays from Agilent). DNA from sorted cells was prepared 
as described above. DNA extraction, purification, labelling and hybridization were 
performed at Sloan Kettering Cancer Center’s Genomics Core Facility according 
to the manufacturer’s instructions. Log, ratios were normalized by Lowess against 
probe intensity and mean %GC of the genomic region mapped to by the probe. 
Segmentation of normalized log, ratios was by Circular Binary Segmentation 
(CBS, R package DNAcopy). 

A separate method was used to investigate whether an amplicon identified by 
CBS in one cell fraction might be present in a minor subpopulation in other cell 
fraction at a level not detected by CBS. A region of interest (ROI) is defined by the 
boundaries of the amplicon detected by CBS. Then this region is investigated in the 
CGH profiles of the other cell fractions as follows: the log, ratios of the N probes 
under the ROI (within amplicon boundaries) are compared to log, ratios of all the 
other probes in the entire chromosome by Student’s t-test (one-tailed). The 
observed t-score is then compared to the distribution of t-scores obtained by 
equivalently testing all other sets of N neighbouring probes in the chromosome. 
The ROI is considered to be significantly gained if the observed t-score is seen or 
exceeded in less than 0.1% of all other chromosomal regions. 

Expression microarray studies of whole tumours. Gene expression profiling was 
performed for a subset of 16 tumours for which additional frozen material was 
available using exon expression arrays (Human Exon 1.0, Affymetrix). RNA was 
extracted, labelled and hybridized at Sloan Kettering Cancer Center's Genomics 
Core Facility according to the manufacturer’s instructions. Data was normalized in 
a cohort of 80 gliomas using Aroma.affymetrix (R package aroma.affymetrix). 
Expression was derived for RefSeq transcripts, and multiple transcripts for the 
same gene were distilled to a single gene expression value by median. 
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Transcriptomal class assignment was based on the nearest centroid of the four 
transcriptomal classes reported in ref. 13, using the subset of 840 signature genes 
described by this study (Supplementary Table 6; http://tcga-data.nci.nih.gov/docs/ 
publications/gbm_exp/). Distances to centroids were defined using Pearson cor- 
relation and class assignments made by the largest correlation value. If the largest 
correlation was <0.2, the sample was labelled ‘indeterminate’. Correlations and 
class assignments are given in Supplementary Table 6. 

Microscopic analysis. Sections were examined with confocal laser scanning 
microscopy (Leica Microsystems; Carl Zeiss MicroImaging). The data was ana- 
lysed with Velocity or LSM5 (Carl Zeiss MicroImaging) software. 

Tumour microvessel density (MVD) was assessed by quantification of the 

numbers of CD31* tumour vessels in pixels using MetaMorph (Molecular 
Devices) image analysis software using unbiased sampling. 
Gene expression analysis and quantitative real-time PCR for sorted cell popu- 
lations. Total RNA of four subpopulations from two specimens were hybridized 
with human U133-plus2 array at Sloan Kettering Cancer Center’s Genomics Core 
Facility and according to the manufacturer’s instructions. Reference databases, 
including one set of CD34* human haematopoietic progenitor cells (GSM476781) 
and two independent sets of human embryonic stem cell-derived endothelial 
progenitors (GSM492830 and GSM492828) were downloaded from Gene 
Expression Omnibus database. The array data are analysed by Partek software. 
The data from 11 samples were normalized by RMA algorithms and the tumour 
samples then assigned in four groups based on the expression of membrane 
markers CD133 and CD144. The gene list is created by ANOVA with unadjusted 
P value less than 0.05 and then used as input for unsupervised hierarchical cluster- 
ing by using Euclidian similarity metric. 

For RT-PCR, total RNA (100-300 ng) was reverse-transcribed using random- 
primer and superscript III (Invitrogen) according to the manufacturer’s instruc- 
tions. Quantitative real-time PCR was performed with an Applied Biosystems 
Prism 7900HT sequence Detection System using SYBR Green PCR Master Mix 
(Applied Biosystems). 

Primers: CD34 (F: TCTGATCTCCATGGCTTCCT; R: ACTGAGGCAACAG 
CTCAACC), CD144 (F: TCGTCATGGACCGAGGTT; R: TCTACAATCCCTT 
GCAGTGTGA), VEGFR2 (F: GCAGGGGACAGAGGGACTTG; R: GAGGCC 
ATCGCTGCACTCA), CD31 (F: TTCCTGACAGTGTCTTGAGTGG; R: GCT 
AGGCGTGGTTCTCATCT), CD133 (F: TCTGGGTCTACAAGGACTTTCC; 
R: GCCCGCCTGAGTCACTAC), ACTIN (F: GCCCGCCTGAGTCACTAC; R: 
GGAATCCTTCTGACCCATGC), VEGFRI (F: TCTCACATCGACAAACCA 
ATACA; R: GGTAGCAGTACAATTGAGGACAAGA), VEGF (F: CTACCTCC 
ACCATGCCAAGT; R: CCACTTCGTGATGATTCTGC). 

Human angiogenesis PCR arrays (SABiosciences) were used to examine the 
expression profiles of angiogenic genes in sorted cell populations. Heat Map 
construction and analysis of qPCR data was conducted according to ref. 31. 
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Calcitum-dependent phospholipid scrambling by 


TMEMI6F 


Jun Suzuki!?, Masato Umeda’, Peter J. Sims* & Shigekazu Nagata? 


In all animal cells, phospholipids are asymmetrically distributed 
between the outer and inner leaflets of the plasma membrane’. 
This asymmetrical phospholipid distribution is disrupted in various 
biological systems. For example, when blood platelets are activated, 
they expose phosphatidylserine (PtdSer) to trigger the clotting sys- 
tem”. The PtdSer exposure is believed to be mediated by Ca**- 
dependent phospholipid scramblases that transport phospholipids 
bidirectionally'*, but its molecular mechanism is still unknown. 
Here we show that TMEMI16F (transmembrane protein 16F) is an 
essential component for the Ca”*-dependent exposure of PtdSer on 
the cell surface. When a mouse B-cell line, Ba/F3, was treated with a 
Ca’** ionophore under low-Ca”* conditions, it reversibly exposed 
PtdSer. Using this property, we established a Ba/F3 subline that 
strongly exposed PtdSer by repetitive fluorescence-activated cell 
sorting. A complementary DNA library was constructed from the 
subline, and a cDNA that caused Ba/F3 to expose PtdSer sponta- 
neously was identified by expression cloning. The cDNA encoded a 
constitutively active mutant of TMEMI6F, a protein with eight 
transmembrane segments’. Wild-type TMEMLIG6F was localized on 
the plasma membrane and conferred Ca”* -dependent scrambling of 
phospholipids. A patient with Scott syndrome®’, which results from 
a defect in phospholipid scrambling activity*”, was found to carry a 
mutation at a splice-acceptor site of the gene encoding TMEMI6F, 
causing the premature termination of the protein. 

When mouse Ba/F3 cells were treated with 1.0 1M A23187 for 
15 min in the presence of 0.5 mM CaCl, the cells underwent necrosis 
or became propidium iodide (PI)-positive. However, when the same 
treatment was performed in Ca’*-free conditions, most of the cells 
exposed PtdSer and the PlI-positive population was low (Fig. 1a). 
Chelating intracellular Ca*" with bis-(o-aminophenoxy)ethane- 
N,N,N',N'-tetra-acetic acid acetoxymethyl ester (BAPTA-AM) 
blocked the PtdSer exposure (Fig. 1b), indicating that the process 
required the mobilization of intracellular calcium. This PtdSer expo- 
sure was reversible: treatment of the PtdSer-exposing cells with 
BAPTA-AM at 37°C for 5 min (Fig. 1c) or culturing them in Ca?*- 
free medium at 37 °C for 12 h (data not shown) eliminated the PtdSer 
from the cell surface. These results suggest that under low-Ca”* con- 
ditions, A23187 mobilized the intracellular Ca?*, which activated a 
phospholipid scramblase to expose PtdSer. When the intracellular 
Ca** concentration was lowered, the phospholipid scramblase lost 
activity, and flippases returned the PtdSer to the inner leaflet. 

To characterize the PtdSer-exposure process, we used its reversible 
nature under low-Ca’* conditions to establish a cell line that over- 
exposed PtdSer. Ba/F3 cells were treated with 1.0 4M A23187 in the 
absence of calcium, and subjected to fluorescence-activated cell sorting 
(FACS) based on PtdSer exposure. A population (0.5-5%) that showed 
intense staining with Annexin V was collected, cultured for 15h in 
Ca**-free medium, returned to normal medium, and subjected to 
the next sorting. After this cycle of sorting and expansion had been 
repeated 12 times, the cells (Ba/F3-PS12) showed roughly 100-fold 


higher staining with Annexin V than the original Ba/F3 cells (Ba/F3- 
PSO) on treatment with 125nM A23187 (Fig. 1d). The sorting and 
expansion were repeated another seven times, and the resulting cell 
line (Ba/F3-PS19) was used for further studies. 

There were two possible causes of the strong PtdSer exposure in Ba/ 
F3-PS19 cells. One was the overexpression or overactivation of phos- 
pholipid scramblase, and the other was the inactivation of flippase’® 
that transports PtdSer from the outer to the inner leaflet of the plasma 
membrane. To examine which possibility was correct, DsRed-expressing 
Ba/F3-PS19 cells were fused with green fluorescent protein (GFP)- 
labelled parental Ba/F3 (Ba/F3-PSO) cells. The PtdSer-exposure response 
of the hybrid cells to 1.0 uM A23187 was similar to, or slightly weaker 
than, that of Ba/F3-PS19 cells (Fig. le), suggesting that the phenotype of 
Ba/F3-PS19 cells was dominant to that of Ba/F3-PSO cells, and that the 
phospholipid scramblase was overactivated in Ba/F3-PS19 cells. To 
identify the gene responsible for the enhanced phospholipid scramblase 
activity, a CDNA library (9.3 X 10° clones) was prepared from Ba/F3- 
PS19 cells, and introduced into the parental Ba/F3 cell line. The stably 
transformed cells were treated with 125 nM A23187, and a population 
that stained strongly with Annexin V was sorted (Fig. 1f). At the third 
cycle of sorting and expansion (Library-Derived (LD)-PS3), about 35% 
of the cells exposed PtdSer without A23187 treatment, and this cell 
population (LD-PS4) was characterized. 

LD-PS4 cells carried two or three different cDNAs, but the Tmem16f 
cDNA (GenBank accession number NM_175344) was present in two 
independent experiments, suggesting that TMEM16F caused the PtdSer 
exposure. The two Tmem16f cDNAs identified in the different experi- 
ments contained an A-to-G mutation at nucleotide 1226, which caused 
an aspartic residue to be replaced by glycine at codon 409 (Fig. 2a). 
TMEM16A, another member of the TMEM16 family, was recently 
shown to be a Ca’**-dependent Cl channel!'-*. However, the Cl’ - 
channel activity of TMEMI6F was lower than that of TMEM16A™. 
To examine the function of TMEMI6F, the wild-type and mutant 
(D409G) forms of TMEMI6F were tagged with Flag or monomeric 
red fluorescent protein (mRFP) at the carboxy terminus, and expressed 
in Ba/F3 or human 293T cells. Western blotting of the cell lysates with 
anti-Flag showed broad bands at 125 and 500kDa on SDS-PAGE 
(Fig. 2b), suggesting that mouse TMEM16F (calculated molecular mass 
106 kDa) is glycosylated and/or aggregated. Observation of the 293T cells 
expressing TMEM16F-mRFP indicated that TMEM16F is located at 
the plasma membrane (Fig. 2c). 

Annexin V was able to bind to the Ba/F3 cells expressing the D409G 
mutant, but not the wild-type, TMEM16F (Fig. 2d), suggesting that the 
mutant TMEM16F-expressing cells constitutively expose PtdSer. This 
was confirmed by binding of MFG-E8, which specifically binds to 
PtdSer'>!° (Supplementary Fig. 1). Chelating the intracellular Ca** 
with BAPTA-AM decreased the exposed PtdSer level in the mutant 
TMEM16F-expressing cells (Fig. 2d). When cells expressing wild-type 
TMEMIG6F were treated with A23187, PtdSer was exposed without a 
lag time, reaching saturation more quickly than the vector-transformed 
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Figure 1 | Molecular cloning of TMEMI6F. a, Ba/F3 cells were treated with 
A23187 with or without CaCl, and stained with Annexin V and PI. DMSO, 
dimethylsulphoxide. b, Ba/F3 cells were incubated with BAPTA-AM and 
treated with A23187. An Annexin V profile in a PI-negative population is 
shown. Open curve, profile of resting cells. c, Ba/F3 cells were treated with 
A23187 and then with BAPTA-AM for 5 min, and stained with Annexin V. 
d, Ba/F3 cells and cells after sorting for 12 cycles (PS12) were treated with 


Ba/F3 cells (Fig. 2e). The intracellular Ca’* concentration and the 
kinetics of the Ca** influx after treatment with A23187 was similar 
among the vector-transformed cells and those expressing wild-type and 
D409G mutant TMEM16F (Supplementary Fig. 2). These results indi- 
cated that TMEM16F mediates a Ca** -dependent scramblase activity 
for PtdSer, and that its D409G mutant is sensitized to respond to the 
normal intracellular concentration of Ca”* to expose PtdSer. 
Phospholipid scramblase mediates the bidirectional transfer between 
plasma membrane leaflets of all phospholipids. Cells expressing the 
D409G mutant TMEM16F were stained with Ro09-0198 (Supplemen- 
tary Fig. 3a), a tetracyclic polypeptide that specifically binds phosphati- 
dylethanolamine (PtdEtn)”, indicating that they constitutively exposed 
PtdEtn, a phospholipid that, like PtdSer, is normally sequestered to the 
inner leaflet. Treatment of Ba/F3 cells with A23187 caused exposure of 
PtdEtn. This process was accelerated by overexpressing wild-type 
TMEMI6F (Supplementary Fig. 3b). When 1-oleoyl-2-{6-[(7-nitro-2- 
1,3-benzoxadiazol-4-yl)amino]hexanoyl}-sn-glycero-3-phosphocholine 
(NBD-PtdCho) was added to the culture, it was quickly internalized by 
the D409G-mutant-expressing cells (Fig. 2f): of the cell-associated 
NBD-PtdCho, more than 40% became resistant to extraction with 
BSA within 6 min. When the cells expressing wild-type TMEM16F were 
treated with A23187, they incorporated NBD-PtdCho faster than the 
parental cells, and about 40% of the cell-associated NBD-PtdCho was 
inside the cells within 4 min (Fig. 2g). Similar results—that is, constitu- 
tive internalization by cells expressing the mutant TMEMI6F, and 
enhanced A23817-induced incorporation by cells expressing wild-type 
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A23187 and stained with Annexin V. e, GFP and DsRed profiles of PSO, PS19 
and PS0/19 hybrid cells are shown. Bottom: the same cells were treated with 
A23187 and stained with Annexin V. f, Ba/F3 cells transformed with PS19 
cDNA library were treated with A23187, stained with Annexin V and sorted 
(LD-PSO). Annexin V and PI profiles of cells after first (LD-PS1) and third (LD- 
PS3) sorting are shown. Right: Annexin V profile of original cells (LD-PS0) and 
after fourth sorting (LD-PS4) without A23187. 


TMEM16F—were obtained with N-{6-[(7-nitro-2-1,3-benzoxadiazol- 
4-yl)amino]hexanoyl}-sphingosine-1-phosphocholine (NBD-SM) (Sup- 
plementary Fig. 4). The internalized NBD-PtdCho and NBD-SM were 
intact (Supplementary Fig. 5). Dynasore, which inhibits dynamin- 
mediated endocytosis’* inhibited the internalization of these phospho- 
lipids only slightly or not at all (Supplementary Fig. 6), suggesting that 
the contribution of endocytosis to TMEM16F-mediated phospholipid 
internalization may not be great. 

Expression of endogenous TMEMI6F in Ba/F3 cells was then 
knocked down by expressing Tmem16f short hairpin RNA (shRNA). 
As shown in Fig. 3a and Supplementary Fig. 7, the expression level of 
Tmem16f messenger RNA in five transformants was decreased to 20- 
35% of that in the cells expressing the control shRNA. The rate of 
A23187-induced exposure of PtdSer and PtdEtn was decreased in these 
transformants (Fig. 3b, c). Similarly, the uptake of NBD-PtdCho and 
NBD-SM was slower in Tmem16fshRNA-transformed cells (Fig. 3d, e). 

Platelets and other blood cells from patients with Scott syndrome 
show a defect in their ability to expose PtdSer in response to a Ca”* 
ionophore”””. B-cell lines have been established from a patient with 
Scott syndrome and from the patient’s parents”. In agreement with 
previous reports*”°, the patient-derived cells did not expose PtdSer in 
response to a Ca?* ionophore (Fig. 4a). In contrast, A23187 elicited 
PtdSer exposure in cell lines derived from the patient’s parents at the 
same levels as in cell lines from healthy volunteers. An RT-PCR analysis 
of the TMEM16F mRNA (GenBank accession number NM_001025356) 
showed that the 5’ part (1,320 base pairs (bp)), corresponding to exons 
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Figure 2 | Phospholipid scrambling in TMEM16F-expressing cells. 

a, Schematic representation of mouse TMEM16F and D409G mutant. 

b, Western blotting of Ba/F3 cells expressing Flag-tagged wild-type and mutant 
TMEM1O6F with anti-Flag. Arrowheads, monomer and multimer of 
TMEMIO6EF. ¢, 293T cells expressing TMEM16F-mREFP were observed under a 
fluorescent microscope. Scale bars, 10 jm. d, Vector-transformed Ba/F3 cells, 
or cells expressing wild-type or mutant TMEM16F, were stained with 
Annexin V with or without pretreatment with BAPTA-AM. e, Vector- 
transformed or wild-type TMEM16F-Ba/F3 cells were preincubated with 
Annexin V. After addition of A23187, the fluorescence was monitored. The 

y axis shows fluorescence intensity on FACS. f, Vector-transformed or mutant 


1-12, was identical in the patient and the parents, whereas its 3’ half, 
corresponding to exons 11-20, was shorter in the patient than in the 
parents (Fig. 4b). A sequence analysis indicated that the cDNA of the 
patient lacked the 226-bp sequence corresponding to exon 13. Direct 
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Figure 3 | Requirement of TMEM16F for phospholipid scrambling. a, Ba/ 
F3 transformants expressing shRNA for Tmem16f (sh16F) or scrambled 
shRNA (shCon). Tmem16f mRNA level was normalized to -actin mRNA and 
is shown as relative expression. b, c, Ba/F3 cells expressing sh16F or shCon were 
preincubated with Cy5-Annexin V (b) or biotin-Ro09-0198 (RO) and 
allophycocyanin (APC)-labelled streptavidin (c). A23187 was added and 
fluorescence was monitored. d, e, Ba/F3 cells expressing sh16F or shCon were 
preincubated with 0.5 uM NBD-PtdCho (d) or NBD-SM (e) in Hanks balanced 
salt solution containing Ca?*. A23187 was added, incubated and diluted with 
fatty-acid-free BSA buffer, and fluorescence was determined. Experiments in 
b-e were performed at least three times. 
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TMEM16F-expressing Ba/F3 cells were incubated for 8 min at room 
temperature (26-27 °C) with 0.5 uM NBD-PtdCho in Hanks balanced salt 
solution containing Ca". After dilution with fatty-acid-free BSA buffer, the 
fluorescence intensity was determined by FACS. g, Vector-transformed or 
wild-type TMEM16F-expressing Ba/F3 cells were preincubated at 4°C with 
0.1 uM NBD-PtdCho. A23187 (A23) was added and incubated for 8 min at 
room temperature, and internalized NBD-PtdCho was determined as above. In 
f and g the percentage of BSA-non-extractable NBD-PtdCho was determined 
in triplicate at 4 min (f) or 6 min (g) after the addition of NBD-PtdCho and is 
plotted as mean and s.d. All experiments were performed at least three times. 


sequencing of the chromosomal DNA indicated that the TMEM16F 
gene of the patient carried a G-to-T homozygous mutation at the 
splice-acceptor site in intron 12, whereas both parents were heterozygous 
for the mutation at this position (Fig. 4c). PCR analysis of the TMEM16F 
mRNA with primers at exons 12 and 16 showed a 608-bp band from the 
control and a 382-bp band from the cell line from the patient with Scott 
syndrome (Fig. 4d), indicating that a mutation in the splice acceptor site 
caused exon 13 to be skipped. This skipping caused a frame shift resulting 
in the premature termination of the protein in exon 14 (Fig. 4e) at the 
third transmembrane segment of human TMEM16F (Fig. 4f). The non- 
sense-mediated mRNA decay”' may explain the decreased concentration 
of the exon-13-deleted form of TMEM16F mRNA in the patient’s par- 
ents (Fig. 4d). 

Repeated FACS analysis has been used previously to establish cell 
lines that overexpress a particular cell-surface protein’*”’. Here, this 
method yielded TMEMIO6F carrying a point mutation that rendered 
the process extremely sensitive to Ca’ *, such that in the cells expressing 
the mutated TMEM16F the phospholipid scramblase functioned even 
in resting cells, in which the cytosolic Ca”* concentration was below 
100 nM (ref. 24). The TMEM16 family, to which TMEM16F belongs, 
consists of ten members in humans and mice’. The founding member of 
the family, human TMEM16A, is a Ca**-dependent Cl channel!)®. 
Although the direct binding of Ca** to TMEM16 members has yet to be 
demonstrated, the amino-terminal region of TMEM16A seems to have 
a regulatory role”. Similarly, the increased sensitivity of the D409G 
mutant to Ca** suggests that either Ca”* or a Ca~* -sensing molecule 
binds to this N-terminal region of TMEMI16F. The overexpression of 
TMEMI6A in Ba/F3 cells had no effect on the ionophore-induced 
exposure of PtdSer (data not shown), suggesting that different mem- 
bers of this family have distinct functions. The PtdSer exposure or 
scrambling of phospholipids occurs in other biological processes’*”*”?, 
such as apoptotic cell death, the fusion of muscle, bone or trophoblast 
cells, and the release of neurotransmitters and microvesicles. It will be 
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Figure 4 | A splice mutation of TMEM16F in a patient with Scott syndrome. 
a, Cells from control, from a patient with Scott syndrome and from the patient’s 
parents, were preincubated with Annexin V. A23187 was added and 
fluorescence was monitored. b, RT-PCR for TMEM16F mRNA for exons 1-12 
and 11-20 with RNA from the patient and parents. c, The junction between 
exon 13 and intron 12 of the TMEMI6F gene sequenced from the 3’ end. The 
CT complementary to the splice acceptor site AG is underlined. Arrowheads 


interesting to study whether TMEMI6F and/or its related members in 
the TMEM16 family are involved in these processes. 


METHODS SUMMARY 


To expose PtdSer reversibly on the cell surface, Ba/F3 cells were treated at 37 °C with 
23187 under Ca**-free conditions. The exposed PtdSer was detected by binding 
of Annexin V at 4 °C in Ca** -containing Annexin V-binding buffer. A subline (Ba/ 
F3-PS19) of Ba/F3 cells that was extremely sensitive to Ca” -ionophore-elicited 
PtdSer exposure was selected by repeating the sorting 19 times with FACSAria (BD 
Bioscience). A cDNA library was established with mRNA from Ba/F3-PS19 cells in 
retrovirus vector, and the cDNA (Tmem1éf) that caused Ba/F3 cells to expose 
PtdSer constitutively was identified by expression cloning. The Epstein-Barr virus 
(EBV)-transformed cell lines from a patient with Scott syndrome and from the 
patient’s parents were described previously”°. The TMEM16F mRNA in these cell 
lines was analysed by RT-PCR. The TMEM16F chromosomal gene was amplified 
by PCR from the genomic DNA of the cell lines, and was directly sequenced by cycle 
sequencing with an ABI 3100 genetic analyser (Applied Biosystems). Exposure of 
PtdSer and PtdEtn on the cell surface was analysed by the binding of Cy5-labelled 
Annexin V and biotin-labelled Ro09-0198 (ref. 17), respectively. The internaliza- 
tion of NBD-PtdCho and NBD-SM was analysed by the BSA-extraction method 
essentially as described*’. For the knock-down experiment, shRNA-retrovirus vec- 
tors for Tmem16f and control scrambled sequence were obtained from OriGene, 
and the resultant retrovirus was used to infect Ba/F3 cells. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Cell lines, recombinant proteins, antibodies, serum and reagents. Mouse inter- 
leukin (IL-3)-dependent Ba/F3 cells were maintained in RPMI medium contain- 
ing 10% fetal calf serum (FCS; Gibco), 45 Uml | recombinant mouse IL-3 and 
50M 2-mercaptoethanol. The EBV-transformed human cell lines” from a 
patient with Scott syndrome and the patient’s parents were grown in RPMI1640 
medium containing 10% FCS and 50 uM 2-mercaptoethanol. Human 293T cells 
and Plat-E packaging cells*' were cultured in DMEM medium containing 10% 
FCS. Recombinant mouse IL-3 was produced by mouse C127I cells transformed 
with a bovine papillomavirus expression vector bearing mouse I/-3 cDNA as 
described*’. Biotin-labelled Ro09-0198 was prepared as described previously’. 
Flag-tagged mouse MFG-E8 was produced in human 293T cells as described”, 
and the secreted MFG-E8 was purified with anti-Flag M2 beads (Sigma-Aldrich). 

Ca’*/Mg?*-free RPMI1640 medium was purchased from Cell Science & 
Technology Institute. Ca**-free RPMI medium contained 0.5mM MgSO. 
Ca’*-free FCS was prepared by dialysing FCS for 2 days against PBS, with four 
changes of buffer. Dynasore was purchased from Calbiochem. 

BAPTA-AM was from Dojindo. NBD-PtdCho and NBD-SM were purchased 

from Avanti Polar Lipids. 
Treatment with Ca** ionophore, flow cytometry, and cell sorting. To expose 
PtdSer on the cell surface, 2 X 10° cells in a 96-well microtitre plate were washed 
with PBS, resuspended in 200 pl of HBSS (Gibco) and treated with A23187 
(Sigma-Aldrich) at 37 °C for 15 min. The cells were stained on ice for 15 min with 
2,500-5,000-fold diluted Cy5-labelled Annexin V (Biovision) in staining buffer 
(10 mM Hepes-NaOH pH 7.4 containing 140 mM NaCl and 2.5 mM CaC];) in the 
presence of 5tgml ' PI. Flow cytometry was performed on a FACSAria (BD 
Bioscience) or FACSCalibur (BD Bioscience) and the data were analysed with 
FlowJo Software (True Star). 

A subline of Ba/F3 cells that was sensitive to Ca” * -ionophore-elicited PtdSer 
exposure was selected by repetitive sorting with a FACSAria. In brief, after 2 x 10” 
Ba/F3 cells in HBSS had been treated at 37 °C for 15 min with A23187, they were 
suspended in 1 ml of Annexin V staining buffer that had been prechilled to 4 °C. 
The cells were stained with Cy5-Annexin V on ice as described above, and sorted 
with a FACSAria whose injection chamber was kept at 4 °C. Cells providing the 
highest level of Cy5 fluorescence signal (the top 0.5-5.0%) were collected and 
resuspended at a density of more than 10° cells ml * in Ca?* -free RPMI contain- 
ing 5% dialysed FCS, 45 U ml‘ IL-3 and 50 1M 2-mercaptoethanol. After 24h the 
cells were resuspended in normal Ca** -containing RPMI medium and expanded 
for the next sorting. 

Construction of the cDNA library. Total RNA was prepared from Ba/F3 PS19 
cells with an RNeasy Mini Kit (Qiagen), and poly(A) * RNA was purified with an 
mRNA Purification Kit (GE Healthcare) with two cycles of oligo(dT)-cellulose 
column chromatography. Double-stranded cDNA was synthesized with random 
hexamers as primers, using a cDNA synthesis kit (SuperScript Choice System for 
cDNA Synthesis; Invitrogen). A BsfXI adaptor was attached, and the fragments 
were size-fractionated by electrophoresis through a 1% agarose gel (Seakem GTG 
agarose; Lonza). DNA fragments longer than 2.5 kb were recovered from the gel 
with a DNA extraction kit (Wizard SV Gel and PCR Clean-up System; Promega) 
and ligated into a BstXI-digested pMXs vector**. Escherichia coli DH10B cells 
(ElectroMax DH10B; Invitrogen) were transformed by electroporation with a 
Gene Pulser (Bio-Rad). About 9.3 X 10° clones were produced, and plasmid 
DNA was prepared with a QIAfilter Plasmid Maxi Kit (Qiagen). 

Cell fusion. Ba/F3-PSO and Ba/F3-PS19 cells were transduced with pMXs-puro 
EGFP and pMXs-neo DsRed, respectively, and cultured in the presence of 1 1g 
ml! puromycin or 1mgml~' G418. Ba/F3-PSO EGFP cells and Ba/F3-PS19 
DsRed cells were fused in the presence of PEG1500 and cultured in the presence 
of 1 pg ml’ puromycin and 1 mgm]! G418. The EGFP/DsRed double-positive 
cells were sorted with a FACSAria. 

Screening of cDNA library. Plasmid DNA (108 jig) from the cDNA library was 
introduced by lipofection with FuGENE6 (Roche Diagnostics) into 7.2 X 10’ 
PLAT-E packaging cells*! grown in eighteen 10-cm dishes. Two days after the 
transfection, the viruses in the culture supernatant were centrifuged at 4°C and 
6,000g for 16h, resuspended in RPMI1640 medium containing 10% FCS and 
45Uml ! IL-3, and used to infect 7.2 X 10° Ba/F3 cells in the presence of 8 jig 
ml ' Polybrene (Sigma-Aldrich). After a 24-h culture, the medium was replaced 
with fresh medium, and the cells were further cultured for 2 days. The sorting of 
cells that were sensitive to ionophore-induced PtdSer exposure was performed as 
described above. 

Isolation of cDNA fragments from Annexin V-positive Ba/F3 cells. To isolate 
the cDNA integrated into the retroviral vector, the genomic DNA was extracted 
from Ba/F3 cell transformants with the Wizard Genomic DNA Purification System 
(Promega) and subjected to PCR with the Expand Long Template PCR System 
(Roche Diagnostics). The PCR primers (5’-CCCGGGGGTGGACCATCCTCT-3’ 
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and 5’-CCCCTTTTTCTGGAGACTAAAT-3’) carried sequences from the pMXs 
vector, and the conditions for PCR were 10 s at 96 °C, 30 s at 58°C and 4 min at 
68 °C for 35 cycles. The PCR fragments were cloned into the pGEM-T Easy vector 
(Promega) and subjected to DNA sequencing analysis with an ABI PRISM 3100 
Genetic Analyser (Applied Biosystems). 

Expression vector for TMEMIG6F and its mutants. The Flag-tag sequence was 
integrated into the EcoRI and Xhol sites of the retroviral vector pMXs-puro, resulting 
in pMXs-puro c-Flag. The full-length coding sequence for mouse TMEM16F 
(GenBank accession number NM_175344) was prepared by RT-PCR with the 
mRNA from Ba/F3 cells. The primers used were as follows (in each primer the 
EcoRI recognition sequence is underlined): 5'-ATATGAATTCGACATGCAGATG 
ATGACTAGGAA-3’ and 5'’-ATATGAATTCGAGTTTTGGCCGCACGCTGT-3’. 

The PCR fragments were inserted into the EcoRI site of pMXs-puro c-Flag, and 
the authenticity of the cDNAs was verified by DNA sequencing. 

For the expression plasmid of TMEM16F-mREFP, the coding sequence for 
mREFP in pcDNA-mRFP (Invitrogen) was joined in-frame to the C terminus of 
mouse TMEMIG6F and introduced into pMXs vector. 

Expression in mouse Ba/F3 and human 293T cells. The expression vector for 
Flag-tagged TMEM16F in pMXs-puro was introduced into Plat-E cells. The retro- 
virus produced was concentrated as described above and used to infect Ba/F3 cells 
to establish stable transformants. The transformants were selected by culturing the 
cells in medium containing puromycin (1.01gml~'). To express TMEM16F- 
mREP, human 293T cells were transfected by lipofection with FuGENE6 with 
the pMXs vector carrying the TMEM16F-mRFP sequence. One day later, the 
transfected cells were observed by fluorescence microscopy (BioRevo BZ-9000; 
Keyence). 

Western blotting. Cells were lysed in RIPA buffer (50 mM Hepes-NaOH pH 8.0 
containing 1% Nonidet P40, 0.1% SDS, 0.5% sodium deoxycholate, 150 mM NaCl 
and 10% protease inhibitor cocktail (Complete Mini; Roche Diagnostics)). The 
lysate was mixed with 5 X SDS sample buffer (200 mM Tris-HCl pH 6.8, 10% SDS, 
25% glycerol, 5% 2-mercaptoethanol, 0.05% bromophenol blue), boiled for 5 min 
and separated by electrophoresis on a 10% polyacrylamide gel (Bio Craft). After 
the proteins had been transferred to a poly(vinylidene difluoride) membrane 
(Millipore), the membranes were probed with horseradish peroxidase-conjugated 
mouse anti-Flag M2 (Sigma), and peroxidase activity was detected with a Western 
Lightning enhanced chemiluminescence system (PerkinElmer). 

RT-PCR of TMEMI6F cDNA and sequencing of its chromosomal gene in a 
patient with Scott syndrome. Total RNA was prepared from EBV-transformed 
cell lines from a patient with Scott syndrome and from the patient’s parents, and 
from a healthy control. The RNA was reverse-transcribed with Superscript III 
(Invitrogen), in accordance with the manufacture’s protocol, and the TMEM16F 
cDNA was analysed by PCR with the following sets of primers (in each primer 
the additional sequence is underlined): Ex1-FW (5'-ATATGAATTCGACATGA 
AAAAGATGAGCAGGAA-3’), Ex11/12-RV (5'-GCGTTCTTCTTCCTGAGT 
AA-3'), Ex11/12-FW (5'-TTACTCAGGAAGAAGAACGC -3"), Ex20-RV (5'- 
ATATGAATTCTTCTGATTTTGGCCGTAAAT-3’), Ex12-FW (5’- TCTGTG 
CCAGTGCTGTCTTT-3’) and Ex16-RV (5'- CTGCAGATGGTAGTCCTGTT-3’). 

For the sequence analysis of the human TMEM16F chromosomal gene, geno- 

mic DNA was prepared from human cell lines and a 965-bp DNA fragment 
carrying the 226-bp exon 13 and its 5’-flanking and 3’-flanking regions (about 
370bp each) was amplified by PCR with the following primers: 5’-CCA 
GAGTATGCTACTAGTTG-3' and 5'-TCTCAGCAACCGAGGAACAT-3’. 
The PCR products were purified with a Wizard SV PCR and Gel Clean-up 
System. Cycle sequencing was performed with a BigDye Terminator v3.1 Cycle 
Sequencing kit with a primer of 5’-GGACCTTACCGAAGTTAGTA-3’, and ana- 
lysed with an ABI PRIZM 3100 Genetic Analyser. 
Analysis of exposure of PtdSer and PtdEtn. To analyse the exposure of PtdSer 
and PtdEtn, 10° cells at early exponential phase were washed with PBS, suspended 
in 1.0 ml of cold Annexin V staining buffer with 2,500-5,000-fold diluted Cy5- 
labelled Annexin V or 800-fold diluted biotin-Ro09-0198 (ref. 33) followed by 
1.0 1g ml_' APC-labelled streptavidin and 5 pg ml’ PI. The samples were incu- 
bated on ice for 15 min, and flow cytometry was performed on a FACSAria or 
FACSCalibur as described above. For binding of MFG-E8, the cells were sus- 
pended in RPMI1640 containing 10% FCS and then incubated on ice for 20 min 
with Flag-tagged D89E mutant of MFG-E8 (0.4 jig ml ')**. The cells were washed 
with the above medium and incubated on ice for 20 min with 1.0 pg ml ' hamster 
monoclonal antibody against mouse MFG-ES8 (clone 2422). This was followed by 
incubation with phycoerythrin-labelled mouse anti-hamster IgG (BD Bioscience) 
and analysis by flow cytometry with a FACSAria. 

To study the requirement for intracellular Ca”*, 10° cells were incubated with 
10 1M BAPTA-AM in RPMI1640 medium containing 10% FCS at 37 °C for 5 min 
for the PtdSer exposure, or for 60 min for the PtdEtn exposure. The cells were 
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washed with Annexin V staining buffer, and stained with Cy5-Annexin V or 
biotin-Ro09-0198 as described above. 

For the kinetic study of the Ca** -induced PtdSer and PtdEtn exposure, 10° cells 

were washed with PBS, suspended in 1.0 ml of cold Annexin V staining buffer with 
Cy5-labelled Annexin V or a mixture of biotin-Ro09-0198 and APC-labelled 
streptavidin, and 5 pg ml! PI. Cells were mixed on ice with A23187 at a final 
concentration of 0.25 or 0.541M, and applied to the injection chamber of a 
FACSAria that was set at 20 °C (for Ba/F3 cells) or 37 °C (for human cell lines) 
to induce the A23187 reaction. Data were recorded for the indicated periods, and 
the PI-positive cells were excluded from the analysis. 
Internalization of NBD-PtdCho and NBD-SM. The internalization of NBD- 
lipid analogues was analysed by flow cytometry essentially as described in refs 8 
and 30. In brief, 10° cells were washed with HBSS and resuspended in 0.5 ml of 
HBSS containing 2 mM CaCl, (HBSS-Ca). An equal volume of HBSS-Ca contain- 
ing 1 4M NBD-PtdCho or NBD-SM was added to the cell suspension and incu- 
bated at room temperature. At each time point, 150 pl of cell suspension was 
collected, mixed with 150 pl of the prechilled (4°C) HBSS-Ca containing 5 mg 
ml ' fatty-acid-free BSA (Sigma-Aldrich), to extract the unincorporated fluor- 
escent lipids, and 500 nM Sytoxblue (Molecular Probes). To measure the total 
fluorescence, samples were mixed with HBSS-Ca in the absence of BSA. After 
incubation for 10 min at 4°C to extract the lipid, the cells were analysed with a 
FACSAria for forward scatter, side scatter, logarithmic green fluorescence (NBD), 
and Sytoxblue fluorescence. The Sytoxblue-positive dead cells were excluded from 
the analysis. The fluorescence of NBD-phospholipids that were resistant to the 
BSA extraction was regarded as representing phospholipids that had been incor- 
porated into cells. 

To examine the effect of the Ca”* ionophore, 5 X 10° cells were washed with 
HBSS-Ca, resuspended in 0.5 ml of cold HBSS-Ca, and incubated on ice for 7 min. 
Cold HBSS (0.5 ml) containing 0.2 4M NBD-PtdCho or NBD-SM was added to 
the cell suspension and incubated further on ice for 3 min. The cells were then 
mixed with A23187 and incubated at room temperature to induce lipid incorpora- 
tion. A 150-1 aliquot was used to determine the incorporated lipid quantity as 
described above. 

Thin-layer chromatography. After incubation of cells with NBD-PtdCho or 
NBD-SM, the phospholipids were extracted from the cells by incubation at room 
temperature for 30 min with a mixture of chloroform, methanol and water (5:10:4, 


by volume). The phospholipids were separated by thin-layer chromatography ona 
silica gel 60 plate (Merck) with chloroform/acetone/methanol/acetic acid/water 
(5:2:1:1:0.5 by volume) as a solvent. The fluorescence on the plate was detected 
with a LAS4000 image analyser (Fuji Film). 

Intracellular Ca”* and Ca”* influx. To determine the intracellular Ca** con- 
centration, 10° cells were suspended in HBSS, incubated at 37 °C for 10 min with 
0.4 tM Fluo-4-AM (Molecular Probes), washed with HBSS, and analysed with a 
FACSAria. 

The Ca’* influx was measured as described**. In brief, 10° cells were labelled for 
30 min at 37 °C with 1 1M Fluo-4-AM in RPMI containing 10% FCS. After being 
washed with the Annexin V staining buffer, the cells were kept at 4 °C in Annexin V 
staining buffer. The Ca~* ionophore A23187 was added to the mixture at a final 
concentration of 0.5 UM, and the change in mean fluorescence intensity was directly 
recorded with a FACSCalibur system. The data was analysed with FlowJo Software. 
shRNA. shRNA expression plasmids for mouse Tmem16f in a pRS shRNA vector 
carrying the puromycin-resistance gene were purchased from OriGene. The target 
sequence of the shRNA for Tmem16f was 5'-CATCTACTCTGTGAAGTTC 
TTCATTTCCT-3’. The scrambled non-effective shRNA (5’-GCACTACCAGA 
GCTAACTCAGATAGTACT-3’) in pRS was from OriGene. Ba/F3 cells were 
infected with retrovirus containing the shRNA, and cultured in the presence of 
1.0ugml~' puromycin. Puromycin-resistant cells were subjected to cloning by 
limited dilution. The Tmem16f mRNA was quantified by real-time PCR, and the 
clones that showed the decreased expression were used for further study. 
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TET2 is a close relative of TET1, an enzyme that converts 5-methyl- 
cytosine (5mC) to 5-hydroxymethylcytosine (5hmC) in DNA’. 
The gene encoding TET2 resides at chromosome 4q24, in a region 
showing recurrent microdeletions and copy-neutral loss of hetero- 
zygosity (CN-LOH) in patients with diverse myeloid malignancies’. 
Somatic TET2 mutations are frequently observed in myelodysplas- 
tic syndromes (MDS), myeloproliferative neoplasms (MPN), MDS/ 
MPN overlap syndromes including chronic myelomonocytic leuk- 
aemia (CMML), acute myeloid leukaemias (AML) and secondary 
AML (sAML)*””. We show here that TET2 mutations associated 
with myeloid malignancies compromise catalytic activity. Bone 
matrow samples from patients with TET2 mutations displayed uni- 
formly low levels of 5hmC in genomic DNA compared to bone 
marrow samples from healthy controls. Moreover, small hairpin 
RNA (shRNA)-mediated depletion of Tet2 in mouse haematopoie- 
tic precursors skewed their differentiation towards monocyte/ 
macrophage lineages in culture. There was no significant difference 
in DNA methylation between bone marrow samples from patients 
with high 5hmC versus healthy controls, but samples from patients 
with low 5hmC showed hypomethylation relative to controls at the 
majority of differentially methylated CpG sites. Our results demon- 
strate that Tet2 is important for normal myelopoiesis, and suggest 
that disruption of TET2 enzymatic activity favours myeloid tumor- 
igenesis. Measurement of 5hmC levels in myeloid malignancies 
may prove valuable as a diagnostic and prognostic tool, to tailor 
therapies and assess responses to anticancer drugs. 

Wetransiently transfected HEK293T cells with Myc-tagged murine 
Tet2 and assessed 5mC and ShmC levels by immunocytochemistry 
(Fig. 1 and Supplementary Figs 1-4). Myc-Tet2-expressing cells dis- 
played a strong increase in 5hmC staining and a concomitant decrease 
in 5mC staining in the nucleus (Fig. 1b, c, quantified in Supplementary 
Fig. 4). In contrast, 5amC was undetectable or barely detected in nuclei 
of cells expressing mutant Tet2 with H1302Y, D1304A substitutions in 
the signature HxD motif"'?”” involved in coordinating Fe”, and there 
was no obvious decrease in nuclear 5mC staining (Fig. 1b, c and Sup- 
plementary Fig. 4). These studies confirm that Tet2 is a catalytically 
active enzyme that converts 5mC to 5hmC in genomic DNA”. 

Mutations in TET2 residues H1881 and R1896, predicted to bind 
Fe?* and 2-oxoglutarate (2OG), respectively, have been identified 
repeatedly in patients with myeloid malignancies**”"°. HEK293T cells 
expressing Tet2 mutants H1802R and H1802Q (Fig. la and Sup- 
plementary Fig. 2) showed greatly diminished 5hmC staining and no 
loss of 5mC staining, consistent with participation of this residue in 
catalysis (Fig. 1b, c and Supplementary Fig. 4a, b). We analysed 
missense mutations identified in TET2 in our own (Supplementary 
Table 1) and other studies**"' (P1367S, W1291R, G1913D, E1318G 


and 11873T). HEK293T cells expressing Tet2 mutants P1287S, 
W1211R or C1834D (Supplementary Figs 2 and 3a) displayed low 
5hmC staining and strong 5mC staining (Supplementary Figs 3b, c 
and 4c, d), indicating a role for these residues in the integrity of the 
catalytic or DNA binding domains. Cells expressing Tet2(R1817S/M) 
(Fig. 1a, Supplementary Figs 2 and 3a) were positive for 5hmC staining 
but changes in 5mC staining could not be assessed reliably (Fig. 1b, c, 
Supplementary Figs 3b, c and 4). 

To quantify these findings, we developed dot blot assays to detect 
5hmC in genomic DNA (Supplementary Fig. 5). In the first assay 
format, the blot was developed with a specific antiserum to 5hmC 
(Supplementary Fig. 5b, left), whose ability to recognize 5hmC 
depended strongly on the density of 5amC in DNA (Supplementary 
Fig. 5c, top). We therefore developed a more sensitive and quantitative 
assay in which DNA was treated with bisulphite to convert 5hmC to 
cytosine 5-methylenesulphonate (CMS)'* (Supplementary Fig. 5a), 
after which CMS was measured with a specific anti-CMS antiserum 
(Supplementary Fig. 5b, right). Unlike anti-ShmC which reacted effi- 
ciently only with DNA containing high densities of 5hmC, the anti- 
CMS antiserum recognized DNA with an average of only a single 
5hmC per 201 base pairs (Supplementary Fig. 5c, bottom). This lack 
of density dependence allowed us to plot the signal obtained with 
twofold dilutions of a standard oligonucleotide containing a known 
amount of 5hmC against the amount of CMS obtained after bisulphite 
conversion. We assumed 100% conversion efficiency’’ and used the 
linear portion of the standard curve to compute the amount of CMS, 
and therefore 5hmC, in the DNA samples (for example, see Fig. 2a, 
right). 

To assess 5hmC levels, we obtained uniform populations of Tet2- 
expressing HEK293T cells by transfection with Tet2-IRES-CD25 
plasmid followed by magnetic isolation of CD25-expressing cells’. 
Wild-type and mutant Tet2 proteins were expressed at comparable 
levels (Fig. 1d and Supplementary Fig. 3d). Anti-ShmC/CMS dot blots 
of genomic DNA revealed, as expected, that 5hmC was barely detect- 
able in DNA from cells transfected with empty vector; DNA from cells 
expressing wild-type Tet2 showed a substantial increase in 5hmC and 
a corresponding decrease in 5mC; and DNA from cells expressing the 
HxD mutant Tet2 protein had very low 5hmC (Fig. le, Supplementary 
Figs 3e and 6). DNA from cells expressing seven of the nine mutant 
Tet2 proteins tested —H1802Q/R, R1817S/M, W1211R, P1287S and 
C1834D—contained significantly less 5hmC than DNA from cells 
expressing wild-type Tet2 (Fig. le, Supplementary Figs 3e and 6), 
confirming our previous conclusion that these mutations impair 
enzymatic activity. 

We measured 5hmC (CMS) levels in genomic DNA extracted from 
bone marrow or blood (with >20% immature myeloid cells) of 88 
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Figure 1 | The catalytic activity of Tet2 is compromised by mutations in 
predicted catalytic residues. a, Schematic representation of TET2. The 
catalytic core region contains the cysteine-rich (Cys-rich) and double-stranded 
beta-helix (DSBH) domains. Three signature motifs conserved among 20G- 
and Fe~*-dependent dioxygenases are shown'”. Substitutions in the HxD 
signature that impair the catalytic activity of TET] (ref. 1), leukaemia-associated 
mutations in the carboxy-terminal signature motifs, and corresponding 
substitutions introduced into murine Tet2 are indicated. b, Tet2 expression 
results in increased 5hmC by immunocytochemistry. HEK293T cells transfected 
with Myc-tagged wild-type and mutant Tet2 were co-stained with antibody 
specific for the Myc epitope (red) and antiserum against 5hmC (green). DAPI 
(blue) indicates nuclear staining. c, Tet2 expression results in loss of nuclear 5mC 
staining. HEK293T cells transfected with wild-type and mutant Myc-tagged 
Tet2 were co-stained with antibody specific for the Myc epitope (green) and 
antiserum against 5mC (red). d, Equivalent expression of wild-type and mutant 
Myc-Tet2. CD25” cells were isolated from HEK293T cells transfected with 
bicistronic Tet2-IRES-human CD25 plasmids, and Tet2 expression in whole cell 
lysates was detected by immunoblotting with anti-Myc. B-actin serves as a 
loading control. e, Genomic DNA purified from CD25* HEK293T cells 
overexpressing wild-type or mutant Tet2 was treated with bisulphite to convert 
5hmC to CMS (Supplementary Fig. 5a). CMS was quantified by dot blot assay 
using anti-CMS and a synthetic bisulphite-treated oligonucleotide containing a 
known amount of CMS. As positive and negative controls, we included DNA 
from CD25* HEK293T cells transfected with TET 1 catalytic domain (TET1- 
CD) or TET1-CD with mutations in the HxD motif (TET1-CD-HxDmut)'. 


patients with myeloid malignancies and 17 healthy controls (Sup- 
plementary Table 1). In blinded experiments, DNA was treated with 
bisulphite and CMS levels were measured. TET2 mutations were 
strongly associated with low genomic 5hmC (Fig. 2 and Supplemen- 
tary Fig. 7a). To confirm these conclusions in a statistically rigorous 
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Figure 2 | TET2 mutational status correlates with 5hmC levels in patients 
with myeloid malignancies. a, Quantification of 5amC by anti-CMS dot blot. 
Left, a representative dot blot of genomic DNA isolated from bone marrow 
aspirates of patients with MDS/MPN and TET2 mutational status as indicated. 
A synthetic oligonucleotide with a known amount of CMS was used as 
standard. Right, the linear portion of the standard curve was used to estimate 
the amount of 5hmC in DNA from patient samples. b, Bar graph of data from 
panel a. The three patients with TET2 mutations show lower ShmC levels than 
the three patients with wild-type TET2. Error bars indicate s.d. (n = 3). 

c, Correlation of 5hmC levels with TET2 mutational status. CMS levels in bone 
marrow samples from healthy donors and patients with myeloid malignancies 
(Supplementary Table 1) are shown as the median of triplicate measurements 
(Supplementary Fig. 7b). In the TET2 mutant group, squares, triangles, 
diamonds and the star indicate homozygous, hemizygous, heterozygous and 
biallelic heterozygous mutations, respectively (for detailed definition, see 
Supplementary Methods). The horizontal bar indicates the median for each 
group. P-values for group comparisons were calculated by a two-sided 
Wilcoxon rank sum test. Patients bearing TET2 mutations show uniformly low 
5hmC expression levels. 


fashion, we tested samples for which a sufficient amount of DNA was 
available to make independent dilutions in triplicate, so that a median 
and standard deviation for 54mC (CMS) levels in each patient could be 
derived (Supplementary Fig. 7b). Analysis of DNA from 9 healthy 
donors and 41 patients (28 with wild-type TET2 and 13 with TET2 
mutations, Supplementary Table 1) revealed a strong, statistically sig- 
nificant correlation of TET2 mutations with low 5hmC (Fig. 2c). In 
contrast, samples from patients with wild-type TET2 showed a bimo- 
dal distribution, with 5hmC levels ranging from ~0.4 to ~3.8 pmol 
per ug DNA (Fig. 2c, Supplementary Fig. 7, also see Fig. 4). 

We examined Tet2 expression in haematopoietic cell subsets iso- 
lated from bone marrow and thymus of C57BL/6 mice (Supplementary 
Figs 8 and 9). Tet2 mRNA was highly expressed in lineage-negative 
(Lin™) Sca-1*c-Kit™ (Sca-1 is also known as Ly6a) multipotent 
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progenitors (LSK), at levels similar to those in embryonic stem cells 
(ESC). Expression was maintained at high levels in myeloid pro- 
genitors (common myeloid progenitors, CMPs, and granulocyte- 
monocyte progenitors, GMPs), was low in mature granulocytes 
(Gr-1* Mac-1*, also known as Ly6g and Cd11b or Itgam, respectively) 
and high in monocytes (Gr-1” Mac-1*) (Supplementary Fig. 9a, mid- 
dle panel). 

To test the role of Tet2 in myelopoiesis, we transduced bone marrow 
stem/progenitor cells with Tet2 shRNA (Supplementary Fig. 10a), 
effectively downregulating Tet2 mRNA and protein relative to control 
cells transduced with empty vector or scrambled shRNA (Fig. 3a, b) 
(refer to Supplementary Fig. 10b for choice of Tet2 shRNA). Tet2 
depletion promoted expansion of Mac-1* F4/80* (also known as 
Emrl1) and Mac-1* CD115* (also known as Csflr or M-CSER, macro- 
phage colony stimulating factor receptor) monocyte/macrophage cells 
in the presence of G-CSF (granulocyte colony-stimulating factor) or 
GM-CSF (granulocyte-macrophage colony-stimulating factor), cyto- 
kines that support granulocyte and granulocyte/ monocyte develop- 
ment respectively, but not in the presence of M-CSF (macrophage 
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Figure 3 | Tet2 regulates myeloid differentiation. a,b, Tet2 shRNA represses 
Tet2 mRNA and protein expression. a, c-Kit* stem/progenitor cells from bone 
marrow of C57BL/6 mice were transduced with retroviruses (Supplementary 
Fig. 10). After selection with puromycin for 3 days, Tet2 mRNA expression was 
assessed by quantitative RT-PCR (PCR with reverse transcription). Error bars 
show the range of duplicates. b, HEK293T cells were cotransfected with 
expression plasmids encoding Myc-tagged Tet2 and retroviral shRNAs. Tet2 
protein expression was quantified 48 h later by anti-Myc immunoblotting of 
whole-cell extracts. c, Effect of Tet2 depletion on myeloid differentiation. Lin — 
cells purified from bone marrow of C57BL/6 mice were transduced with control 
(scramble) or shTet2 retroviruses, then grown in the presence of 50ng ml’ 
stem cell factor (SCF), puromycin (2 1g ml ') and cytokines (10 ng ml ') as 
indicated (also see Supplementary Fig. 10). After 4 days, flow cytometric 
analysis of Mac-1 versus F4/80 (left panel) or CD115 (right panel) was 
performed. All cells were GFP” on the day of analysis. 
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colony-stimulating factor), which promotes growth of monocytic 
progenitors (Fig. 3c and Supplementary Fig. 10d). Simultaneous treat- 
ment with GM-CSF and M-CSF, or GM-CSF and G-CSF, also led to 
increased numbers of monocyte/macrophage cells (Fig. 3c). These 
results indicate that Tet2 has an important role in normal myelo- 
poiesis. However, Tet2 does not markedly influence short-term pro- 
liferation of myeloid-lineage cells: when shRNA-transduced Lin cells 
were cultured in the presence of GM-CSF and pulse-labelled with 
bromodeoxyuridine (BrdU), Tet2 depletion promoted monocyte/ 
macrophage expansion but CD115* (M-CSFR*) cells from the two 
cultures showed no difference in acute BrdU incorporation (Sup- 
plementary Fig. 11). 

We asked whether 5hmC levels in tumour samples correlated with 
DNA methylation status. A histogram of normalized values from 88 
patients and 17 healthy individuals showed the expected bimodal distri- 
bution (see Supplementary Methods): healthy controls and most patient 
samples with wild-type TET2 had high 5hmC, whereas the majority of 
patient samples with mutant TET2 had low 5hmC (Fig. 4b). The DNA 
methylation status of 62 samples was interrogated at 27,578 CpG sites. As 
expected’®, the resulting histograms were strikingly bimodal, with sites 
within and outside CpG islands showing low and predominantly high 
methylation, respectively (Fig. 4c). Comparison of 28 control samples 
with 24 high 5hmC tumour samples (22 wild-type TET2, 2 mutant 
TET2) showed no significant difference in DNA methylation; in contrast, 
comparison of the control samples with 29 low 5hmC tumour samples (7 
wild-type TET2 , 22 mutant TET2) yielded 2,512 differentially methy- 
lated sites, of which the majority (2,510 sites) were hypomethylated 
compared to controls (Fig. 4d and Supplementary Table 2). Thus 
TET2 loss-of-function is predominantly associated with decreased 
methylation at CpG sites. 

To summarize, our studies demonstrate a strong correlation between 
myeloid malignancies and loss of TET2 catalytic activity. The leuk- 
aemia-associated missense mutations associated with diminished 
5hmC levels provide clues to the structure of the TET2 catalytic domain. 
The W1211R, P1287S and C1834D mutations affect positions that are 
highly conserved within the catalytic domain of the TET subfamily of 
dioxygenases*: W1211 is located at the beginning of the strand just 
amino-terminal to the core of the double-stranded beta-helix (DSBH), 
and is predicted to constitute part of the ‘mouth’ of the active site pocket 
of the enzyme; P1287 is predicted to stabilize the conformation of the 
junction between the N-terminal helix and the first core strand of the 
DSBH; and G1913/C1834 is predicted to be the N-terminal capping 
residue of a helix that lines the ‘mouth’ of the DSBH and potentially 
interacts with substrate DNA’. The E1238G mutation had no detectable 
effect on ShmC production in our overexpression assays; however, the 
patient with this mutation also showed CN-LOH spanning 4q24, a 
feature that likely contributes to the significant reduction in 5hmC levels 
observed in the bone marrow. 

Low 5hmC levels were observed in a subset of patients with apparently 
wild-type TET2, whose clinical phenotypes resembled those of patients 
with mutant TET2. In several of these patients, TET2 mRNA expression 
was not significantly different from controls; mutations in other TET 
proteins have not been described (Supplementary Text). Some patients 
in the wild-type TET2/low 5hmC category may harbour mutations in 
regulatory or partner proteins for TET2, or in cis-regulatory regions 
controlling TET2 mRNA expression. Alternatively, the primary event 
in some of these patients may be CpG hypomethylation, resulting in 
decreased 5hmC secondary to depletion of the substrate, 5mC. 

There is little consensus on whether TET2 mutations correlate with 
clinical outcome. One study reported an association with decreased sur- 
vival in AML’, whereas others report little prognostic value in MPN 
diseases”!*!”. Assays for 54mC may increase our options for the molecu- 
lar classification of myeloid malignancies, making it possible to ask 
whether patients with high or low levels of genomic ShmC show differ- 
ences in disease progression or therapeutic response. Notably, histone 
deacetylase and DNA methyltransferase inhibitors show clinical efficacy 
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Figure 4 | Relation of 5hmC levels to DNA methylation status. 

a, Normalized 5hmC (CMS) levels in DNA from three different groups: healthy 
controls (black diamonds), patients with mutant TET2 (red symbols) and 
patients with wild-type TET2 (blue circles). Among TET2 mutants, we 
distinguish homozygous (squares), hemizygous (triangles), heterozygous (small 
diamonds) and biallelic heterozygous (star) mutations (for definitions see 
Supplementary Methods). The horizontal bar indicates the median for each 
group. The number of samples in each group is indicated. b, Histogram of 
normalized 5hmC (CMS) levels in DNA from healthy donors (black diamonds), 
patients with mutant TET2 (red rectangles) and patients with wild-type TET2 
(blue circles). The frequency was calculated based on a Gaussian kernel estimator. 
The local minimum between both modes was used as a threshold (vertical dotted 
line) between low and high 5hmC values. c, Density of methylation values for 
healthy controls (black), high 5hmC samples (blue) and low 5hmC samples (red) 
of all sites (top panel), sites outside CpG islands (middle panel) and sites inside 
CpG islands (lower panel). d, Box plot for group-specific methylation for the only 
two hypermethylated sites (SP140, AIM2; top panel) and the top nine 
hypomethylated sites (lower panels) between healthy controls and low 5hmC 
samples (total number of differentially methylated sites was 2,512). 


in patients with CMML and AML”; and genomic 5hmC levels could 
potentially be a useful prognostic indicator or predictor of patient res- 
ponses or refractoriness to ‘epigenetic’ therapy with demethylating agents. 

DNA methylation is highly aberrant in cancer’*°. Because TET 
operates on 5mC, we were surprised to find that TET2 loss-of-function 
in myeloid tumours was associated with widespread hypomethylation 
rather than the expected hypermethylation at differentially-methylated 
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CpG sites. Tumour samples with low 5hmC may have expanded cells 
with localized hypomethylation at these sites, or TET2 may control 
DNA methylation indirectly, for instance by regulating the expression 
or recruitment of one or more DNA methyltransferases, perhaps via 
5hmC-binding proteins. Alternatively, if TET2 and 5hmC are required 
for cells to exit the stem cell state, loss of TET2 function in myeloid 
neoplasms may reactivate a stem-like state characterized by generalized 
hypomethylation and consequent genomic instability*'’’. Indeed, 
hypomorphic DNMT1 mutations associated with genome-wide DNA 
hypomethylation skew haematopoietic differentiation towards myelo- 
erythroid lineages”’, and promote the development of aggressive T-cell 
lymphomas due to activation and insertion of endogenous retro- 
viruses***, Further studies of the role of TET2 in haematopoietic 
differentiation should uncover the relation between TET2 loss-of- 
function, DNA methylation changes and myeloid neoplasia. 


METHODS SUMMARY 

Patient samples. Genomic DNA was extracted from bone marrow/ peripheral 
blood samples from healthy donors and patients with MDS, MDS/MPN, primary 
and secondary AMLs. Clinical features and other detailed information pertaining 
to the patient samples are summarized in Supplementary Table 1. 

Quantitative analysis of 5;4mC and CMS levels using dot blot. For CMS detec- 
tion, genomic DNA was treated with sodium bisulphite using the EpiTect Bisulfite 
kit (Qiagen). DNA samples were denatured and twofold serial dilutions were spotted 
on a nitrocellulose membrane in an assembled Bio-Dot apparatus (Bio-Rad). The 
blotted membrane was washed, air-dried, vacuum-baked, blocked and incubated 
with anti-5hmC or anti-CMS antibody (1:1,000) and horseradish peroxidase- 
conjugated anti-rabbit IgG secondary antibody. To ensure equal spotting of total 
DNA on the membrane, the same blot was stained with 0.02% methylene blue in 
0.3 M sodium acetate (pH 5.2). To compare results obtained in different experiments, 
we used the normalization procedure described in Supplementary Methods (see 
Fig. 4a, b, which incorporate data from Fig. 2 and Supplementary Fig. 6). 
Methylation analysis. The DNA methylation status of bisulphite-treated genomic 
DNA was probed at 27,578 CpG dinucleotides using the Illumina Infinium 27k 
array (Illumina)**. Methylation status was calculated from the ratio of methylation- 
specific and demethylation-specific fluorophores (f-value) using BeadStudio 
Methylation Module (Illumina). We removed sites on the Y and X chromosomes 
from the analysis because of inconsistent methylation status with respect to gender 
(a known problem based on communication with Illumina). Calculations are based 
on f values, which correspond to the methylation status of a site ranging from 0 to 1, 
returned by Illumina’s BeadStudio software. We tested sites for differential methy- 
lation using an empirical Bayes approach employing a modified t-test (LIMMA). 
The false discovery rate (FDR) is controlled at a level of 5% by the Benjamini- 
Hochberg correction. 
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Design, function and structure of a monomeric CIC 


transporter 


Janice L. Robertson!, Ludmila Kolmakova-Partensky! & Christopher Miller! 


Channels and transporters of the CIC family cause the transmem- 
brane movement of inorganic anions in service of a variety of bio- 
logical tasks, from the unusual—the generation of the kilowatt 
pulses with which electric fish stun their prey—to the quotidian— 
the acidification of endosomes, vacuoles and lysosomes’. The homo- 
dimeric architecture of CIC proteins, initially inferred from single- 
molecule studies of an elasmobranch Cl” channel? and later con- 
firmed by crystal structures of bacterial Cl"/H™ antiporters*, is 
apparently universal. Moreover, the basic machinery that enables 
ion movement through these proteins—the aqueous pores for anion 
diffusion in the channels and the ion-coupling chambers that coor- 
dinate Cl” and H* antiport in the transporters—are contained 
wholly within each subunit of the homodimer. The near-normal func- 
tion of a bacterial CIC transporter straitjacketed by covalent cross- 
links across the dimer interface and the behaviour of a concatemeric 
human homologue argue that the transport cycle resides within each 
subunit and does not require rigid-body rearrangements between 
subunits*®. However, this evidence is only inferential, and because 
examples are known in which quaternary rearrangements of extra- 
membrane CIC domains that contribute to dimerization modulate 
transport activity’, we cannot declare as definitive a ‘parallel- 
pathways’ picture in which the homodimer consists of two single- 
subunit transporters operating independently. A strong prediction 
of such a view is that it should in principle be possible to obtain a 
monomeric CIC. Here we exploit the known structure of a CIC Cl™/ 
H* exchanger, ClC-ecl from Escherichia coli, to design mutants that 
destabilize the dimer interface while preserving both the structure and 
the transport function of individual subunits. The results demon- 
strate that the CIC subunit alone is the basic functional unit for 
transport and that cross-subunit interaction is not required for 
Cl-/H* exchange in CIC transporters. 


To develop a strategy for generating a monomeric CIC protein, we 
examined the structure of ClC-ecl (Fig. 1) for candidate residues 
mediating dimerization. This homologue is well suited to our purpose 
because its dimerization interface is almost completely membrane 
embedded, the large intracellular carboxy-terminal domain found in 
some CIC proteins being absent here. The interface is formed mainly 
by four helices running roughly perpendicular to the membrane to 
create a flat, nonpolar surface of ~ 1,200 v? (Fig. 1). Most cross-subunit 
contacts are made by interdigitated leucine and isoleucine side chains; 
residues capable of forming hydrogen bonds or salt bridges are absent. 
The protein’s phospholipid-facing residues are also nonpolar (Fig. 1), a 
circumstance that invites questions of how such chemically similar 
surfaces so faithfully choose their respective protein and lipid partners 
in the dimer. Such questions have motivated extensive studies of trans- 
membrane peptide dimerization*”’, which identified shape comple- 
mentarity as an important determinant of helix packing specificity 
within membranes and micelles. Shape complementarity of the ClC- 
ecl dimer interface is high, scoring at levels seen for protein-antibody 
contacts and several membrane protein oligomers (Supplementary 
Table 1). Accordingly, our design strategy seeks to destabilize the dimer 
by placing steric mismatches on the CIC subunit interface. A second 
element of the strategy aims at favouring the interface’s exposure to the 
lipid bilayer. Lipid-facing surfaces of many membrane proteins are 
known to present amphiphilic tryptophan or tyrosine side chains to 
the chemically heterogeneous transition zone where the lipid acyl 
chains connect to the polar head groups, as seen for ClC-ecl in 
Fig. 1; membrane-thermodynamic analysis of tryptophan analogues 
establishes that the aromatic, bifunctional character of this side chain 
favours its location at the phospholipid bilayer’s transition zone’. 

With these considerations in mind, we adopted a ‘warts-and-hooks’ 
strategy for engineering a monomeric CIC by introducing tryptophan 


2cr 


2c 


Figure 1 | Structure and dimeric interface of ClC-ecl. a, ClC-ecl dimer 
(Protein Data Bank ID, 1OTS) is shown with subunits in grey and blue, with 
hydrophobic residues highlighted in yellow, and with tryptophan and tyrosine in 
magenta. The level of the membrane (extracellular side up) is indicated by black 


lines. Previously proposed transport pathways are shown for Cl” and H™. 

b, Single subunit rotated 90° to view the dimerization interface head-on. The four 
interface helices (residues 192-204, 215-232, 405-416 and 422-440) are shown 
in red and the side chains involved in cross-subunit contacts are shown in yellow. 
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mutations on the subunit interface near the level of the lipid head 
groups. This type of substitution simultaneously offers two kinds of 
perturbation: steric disruption of the contact surface’s shape comple- 
mentarity and enhanced affinity of this surface for the lipid bilayer. We 
constructed eight single tryptophan substitutions for leucine or iso- 
leucine near the extracellular and intracellular ends of the four dimer- 
ization helices (Fig. 2a). All but one of these mutants express near wild- 
type levels, and the oligomerization state of each was analysed in 
decylmaltoside micelles on a size exclusion column calibrated with a 
panel of membrane transport proteins’? (Fig. 2b). The wild-type 
homodimer (100kDa) elutes, as expected, at 12.8 ml, and the 50- 
kDa monomer is predicted to elute about 1 ml later. One mutant, 
Ile 422 Trp, shifts precisely to the presumed monomer position, with 
a minor dimer peak also apparent. Three other mutants, Ile 201 Trp, 
Leu406Trp and Leu434Trp, show broader, asymmetric peaks 
centred between dimer and monomer positions. The remaining three 
mutants all run as dimers (data not shown). In hopes of further sta- 
bilizing a monomer, we tested the double mutant Ile201Trp/ 
Ile 422 Trp, which if dimeric would place four ‘warts’ within the sub- 
unit contact region, and if monomeric would offer two ‘hooks’ to the 
bilayer, one on each side of the membrane. This mutant, henceforth 
denoted WW, cleanly shifts to the monomer position with no observ- 
able dimer peak. The oligomeric nature of this double mutant in deter- 
gent micelles was further assessed by treatment with glutaraldehyde, a 
promiscuous crosslinker known quantitatively to produce covalent 
dimers of ClC-ec1", as illustrated for the wild type by SDS-polyacry- 
lamide gel electrophoresis (Fig. 2c). In contrast, glutaraldehyde treat- 
ment fails to shift WW to the covalent-dimer position, thereby 
identifying it as a monomer in detergent. 

To identify the oligomeric state of WW in lipid bilayers, we repeated 
glutaraldehyde crosslinking experiments on this protein reconstituted 
into liposomes. Phosphatidylcholine-phosphatidylglycerol mixtures 


(409 1201W 
L406W 
c Wt +glut Www +glut 
1422W 
L434W 


1min 30min SDS 1min 30min SDS Www 
11 12 13 14 15 16 


Elution volume (ml) 


Figure 2 | Behaviour of tryptophan mutants in detergent. a, Schematic of 
the dimerization interface showing the positions of the tryptophans tested. 
Leu 194 Trp did not express protein. b, Chromatographic profiles of the various 
mutants on a Superdex 200 column. Vertical lines mark elution volumes for 
dimer (dashed) and monomer (solid). WT, wild type. ¢, 10% SDS- 
polyacrylamide gel electrophoresis of wild-type and WW samples, Coomassie 
stained. Bars indicate samples at 0.25mgml ! treated with 0.125% 
glutaraldehyde, 150 mM NaCl and 50 mM Na phosphate, pH 7.0, for the 
indicated times in 5 mM decylmaltoside or, as a negative control, in 2% SDS. 
Crosslinking is nearly complete after 1 min, and no higher oligomers appear 
even after 30 min. 
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were used here to avoid lipid-associated amino groups that would 
confound the glutaraldehyde reaction. We also aimed in these experi- 
ments to approximate Poisson-dilution conditions”, wherein a low 
protein/lipid ratio is used so that most liposomes are protein free and 
any liposome containing protein carries only a single transporting 
unit. Under such conditions, each liposome becomes a single-molecule 
reaction vessel in which intramolecular crosslinking is favoured. As 
shown in Fig. 3a, crosslinking in liposomes recapitulates the detergent 
results, thereby showing that WW is also monomeric in these bilayer 
membranes. 

The experiments above establish the WW mutant as monomeric but 
do not address its conformational or functional character. We there- 
fore performed two mechanistically diagnostic ion-transport measure- 
ments in the same liposome environment as was used for the 
crosslinking experiments. The unitary passive Cl” transport rate was 
determined in a ‘Cl’ dump’ experiment”, in which liposomes with 
high Cl” concentration are suspended in low-Cl™ solution in the 
presence of H* and K” ionophores, to prevent a pH gradient build- 
up and to maintain zero voltage. Under these conditions, the unitary 
Cl’ efflux rate of wild-type protein, measured electrochemically by the 
appearance of Cl in the external solution (Fig. 3b), is ~300 s ';Cl~ 
turnover by WW is roughly half of this value (160s '), a respectable 
activity. Furthermore, anion specificity of transport is maintained in 
WW, as Cl” efflux is fully dependent on addition of K* ionophore. 
ClC-ecl is a coupled Cl-/H* exchanger, in which a pre-established 
Cl gradient can be used to pump H™ thermodynamically ‘uphill’. 
The WW monomer retains this defining feature of the transport 
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Figure 3 | Monomeric CIC mutant in phospholipid membranes. 

a, Glutaraldehyde crosslinking of wild-type ClC-ecl and the WW mutant in 
liposomes. Glutaraldehyde treatment was as in Fig. 2, except that protein was 
incorporated into phosphatidylcholine-phosphatidylglycerol liposomes, and 
gel was silver stained. b, Passive Cl efflux from reconstituted liposomes for 
wild-type ClC-ecl and the WW mutant. Traces show release of Cl” from 
liposomes loaded with 300 mM Cl into the extraliposomal solution 
(containing 1mM CLI _), initiated by 0.5 uM valinomycin (downward 
arrowhead), normalized to the level of complete release on disrupting 
liposomes with 50 mM octylglucoside (upward arrowhead). Unitary turnover 
calculated on a per-subunit basis from the initial rate of Cl release'* was 
290 + 30s | for wild type, 160 +9 s | for WW (mean = s.e.m., N= 9). [Cl], 
Cl” concentration. ¢, Cl” -driven H* pumping against a pH gradient. 
Liposomes loaded with 300 mM Cl, pH 5.0, were suspended in 1 mM Cl, pH 
5.2, and transport was initiated by valinomycin (upward arrowhead) and 
terminated by carbonyl cyanide-p-trifluoromethoxyphenylhydrazone (FCCP, 
downward arrowhead), while the pH of the suspension was recorded. Upward 
deflection represents uptake of H™ into liposomes. 
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Figure 4 | Crystal structure of the WW monomer. a, View of two monomers 
in side-by-side contact, with interface helices highlighted in red, Cl” ion 
highlighted in green and additional symmetry-related monomers shown in 
grey in the background. b, Backbone alignment (Co root mean squared 
deviation, 0.6 A) of the WW monomer (yellow, with interface helices in red) 
with a single subunit of wild-type ClC-ecl (grey). Blue spheres indicate the 


mechanism. As shown in Fig. 3c, Cl -loaded liposomes are suspended 
in low-Cl” medium, and transport is initiated by depolarizing the 
liposomes with K* ionophore. As Cl” flows out, H™ enters against a 
pH gradient, as detected by alkalinization of the extraliposomal med- 
ium, which is swiftly reversed by addition of a proton ionophore. We 
established Cl /H* exchange stoichiometry from the ratio of initial 
flux rates (Supplementary Fig. 1): 2.0 + 0.1 for the wild-type control, as 
expected from the two-to-one stoichiometry determined in E. coli 
lipids'’*"°, and a similar value, 2.3 + 0.3, for WW. The preservation 
of H*-coupled Cl” antiport in the monomeric construct directly 
establishes that the ClC subunit contains all essential components of 
the transport mechanism. The possibility remains that side-chain 
movements at the dimer interface in wild-type homodimer may occur 
during the transport cycle, as indicated convincingly by recent '°F 
NMRexperiments”, but our results demonstrate that such movements 
cannot represent functionally obligatory cross-subunit interactions. 
We crystallized the WW mutant, collected X-ray diffraction data to a 
resolution of 3.1 A and solved the structure by molecular replacement 
using the wild-type subunit as search model (crystallographic statistics 
are shown in Supplementary Table 2). The asymmetric unit consists ofa 
single monomer whose previously buried dimer interface is now com- 
pletely exposed to detergent-containing solvent. This exposed interface 
is shown in Fig. 4a (also see Supplementary Fig. 2) for a symmetry- 
related pair of monomers, whose contacts in the unit cell arise from 
crystal geometry and are not seen in crystals of wild-type ClC-ecl. We 
consider it remarkable that the monomer’s 18 membrane-embedded 
helices align precisely with those of the wild-type subunit in the homo- 
dimer (Fig. 4b), despite the absence of native cross-subunit interactions. 
Only the cytoplasmic amino-terminal helix (residues 22-30), which in 
the wild type engages in a domain swap with its twin subunit, veers off in 
a different direction to accommodate crystal packing. Moreover, most 
side chains projecting from the exposed subunit interface are well 
ordered and unperturbed from their buried positions in the wild-type 
dimer, except for a single tyrosine, which adopts a different rotamer to 
make room for one of the substituted tryptophans (Supplementary Fig. 
3). Unambiguous density for the mechanistically crucial central Cl” ion 
appears in the monomer at the same position as in the wild type, 
coordinated by the central serine and tyrosine residues (Fig. 4 and 
Supplementary Fig. 2); however, Cl” density is lower in WW than in 
wild-type data sets of similar crystallographic quality'’, perhaps because 
crystallization of the monomer requires the additional presence of 
NOs, a transported anion known to compete with Cl” (refs 13, 19). 
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N termini of the visible structures. c, Central anion-binding site. The 2F,-F. 
map (blue, 1.5¢) is shown near the central Cl -binding site, with coordinating 
residues Ser 107, Glu 148 and Tyr 445 highlighted (yellow); the positive 
difference density calculated from a Cl" -omit map (green) shows a strong peak 
(3.5a) at the position of the central Cl” ion in the wild type. Stereo versions of 
panels a and c can be found in Supplementary Fig. 2. 


A perplexing clash of form and function arises from this demonstra- 
tion that the isolated CIC subunit is transport competent: why then are 
all known CIC proteins homodimers? With the steady expansion of the 
membrane protein structural database, it is becoming apparent that the 
parallel-pathways theme discussed here for ClCs appears in many fam- 
ilies of channels and transporters. For instance, aquaporin channels are 
homotetramers with a diffusion pore in each subunit”, FNT-family 
formate channels are five-pore pentamers*’” and UT-family urea 
channels”, Amt-type ammonia channels”? and outer-membrane por- 
ins” are three-pore trimers. Among membrane transporters, a striking 
example is found in five phylogenetically unrelated families whose 
transporting subunits share a common structural fold but variously 
assemble as monomers, dimers or trimers’’**. A survey of the current 
literature identifies no fewer than fourteen separate families (~40% of 
structurally known membrane transport protein families) built on this 
parallel-pathway principle, with subunits held together through 
extended, nonpolar intramembrane contacts. We are loath to offer 
any suggestion for the ‘meaning’—evolutionary or physiological—of 
this emerging structural theme; in only one case, a trimeric Na’ - 
coupled aspartate transporter of the EAAT superfamily”, has par- 
allel-pathway architecture been plausibly proposed as essential for sub- 
strate transport by the individual subunits making up the complex. 

Although our warts-and-hooks design succeeded in severing the 
CIC dimer, we do not claim to understand the thermodynamic reasons 
for its success. The energetic components governing how a greasy 
protein surface chooses its greasy protein partner over a greasy lipid 
bilayer are still unparsed. Previous attempts to attack this fundamental 
problem of membrane protein chemistry have focused on model sys- 
tems of single transmembrane helical peptides*”°*, and most have 
been quantifiable only in detergent micelles. The CIC interface intro- 
duced here may provide future opportunities to examine the molecular 
forces operating in transmembrane helix packing, folding and recog- 
nition in the context of a complex integral membrane protein. 


METHODS SUMMARY 


Expression in E. coli, purification and liposome reconstitution of ClC-ecl (Swiss- 
Prot ID, P37019) were performed as described"*, as were Cl and H* flux assays, 
except that we used lipid mixtures of egg phosphatidylcholine and 1-palmitoyl, 
2-oleoyl phosphatidylglycerol in a 3/1 weight ratio. Ion flux rates in these lipids are 
5-10-fold lower than observed in the E. coli phospholipids that we customarily use. 
Liposomes were formed at 20mg ml ’ lipid, 1 1g protein per milligram lipid by 
dialysis, or by centrifugation of 0.1-ml samples through 3-ml Sephadex G-50 
columns. Cl” /H* exchange stoichiometry was determined as the ratio of initial 
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transport rates'*, with Cl" effluxand H* uptake recorded by means of CI” and H* 
electrodes using liposomes loaded with 300 mM KCl and 40 mM citrate-NaOH, 
pH 5.0, suspended in solutions of 1 mM KCl, 300 mM K isethionate and 2 mM 
citrate-NaOH, pH 5.2. For crystallography, 1 pl CIC protein (9-15 mg ml‘) in 
100 mM NaCl, ~40 mM decylmaltoside and 10 mM Tris-HCl, pH 7.5, was mixed 
with an equal volume of 100 mM LiNOs, 41-45% (w/v) PEG400, 100 mM glycine- 
NaOH, pH 9.5, and ~10 mM 4-cyclohexyl-1-butyl-f-p-maltoside was added to 
the 2-11 drop. Crystals grown by vapour diffusion in sitting drop trays for 2-4 
weeks at 20 °C were frozen in liquid nitrogen, and data were collected remotely at 
beamline 8.2.1 of the Advanced Light Source Eastern Annex, Waltham, 
Massachusetts. Data were processed in HKL2000. Molecular replacement was 
done in PHASER using residues 30-450 of a single subunit of ClC-ecl as search 
model, and refinement was carried out in REFMACS5. 
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The human interactome contains more than 100,000 protein interactions, only a fraction of which are known. 


Interactome under 
construction 


Developing techniques are helping researchers to build the 
protein interaction networks that underlie all cell functions. 


BY LAURA BONETTA 


nold adage says: “Show me your friends, 
A™ ll know who you are?’ In the same 

way, finding interaction partners for 
a protein can reveal its function. To that end, 
researchers are now building entire networks of 
protein-protein interactions. Unlike biological 
pathways, which represent a sequence of molec- 
ular interactions leading to a final result — for 
example, a signalling cascade — networks are 
interlinked. Represented as starbursts of pro- 
tein ‘nodes linked by interaction ‘edges’ to form 
intricate constellations, they provide insight into 
the mechanisms of cell functions. Furthermore, 
placing proteins encoded by disease genes into 
these networks will let researchers determine 
the best candidates for assessing disease risk 
and targeting with therapies. 

“This is the next step after the Human 
Genome Project,’ says Trey Ideker, a systems 
biologist at the University of California, San 
Diego, and principal investigator at the National 
Resource for Network Biology, which provides 


open-source software for network visualiza- 
tion. “That effort identified 30,000 genes, but 
that is not the end goal. How the genes work in 
pathways and how these pathways function in 
disease states and development is the end goal. 
To accomplish this we will need to systemati- 
cally map gene and protein interactions.” 

Unlike the genome, the interactome — the 
set of protein-to-protein interactions that 
occurs in a cell — is dynamic. Many inter- 
actions are transient, and others occur only 
in certain cellular contexts or at particular 
times in development. The interactome may 
be tougher to solve than the genome, but the 
information, researchers say, is crucial for a 
complete understanding of biology. 


THE RIGHT PARTNERS 

At any time, a human cell may contain about 
130,000 binary interactions between proteins we 
So far, a mere 33,943 unique human protein- 
protein interactions are listed on BioGRID 
(http://thebiogrid.org), a database that stores 
interaction data. Clearly, there is work to do. 


There are two main approaches for detecting 
interacting proteins: techniques that measure 
direct physical interactions between protein 
pairs — binary approaches — and those that 
measure interactions among groups of proteins 
that may not form physical contacts — co- 
complex methods (see “Tools for the search’). 

The most frequently used binary method is 
the yeast two-hybrid (Y2H) system’. It has vari- 
ations involving different reagents, and has been 
adapted to high-throughput screening. The 
strategy interrogates two proteins, called bait 
and prey, coupled to two halves of a transcrip- 
tion factor and expressed in yeast. If the proteins 
make contact, they reconstitute a transcription 
factor that activates a reporter gene. 

Another method for identifying binary 
interactions is luminescence-based mamma- 
lian interactome mapping (LUMIER), a high- 
throughput approach developed by Jeff Wrana 
at the Samuel Lunefeld Research Institute in 
Toronto, Canada. This strategy fuses Renilla 
luciferaze (RL) enzyme, which catalyses light- 
emitting reactions, to a bait protein, which is 
expressed in a mammalian cell along with can- 
didate protein partners tagged with a polypep- 
tide called Flag. Researchers use a Flag antibody 
to immunoprecipitate all proteins with the Flag 
tag, along with any that interact with them. 
Interactions between the RL-fused bait and 
the Flag-tagged prey are detected when light 
is emitted. Other binary methods include the 
mammalian protein-protein interaction trap 
and techniques based on proteome chips. 

The most common co-complex method is 
co-immunoprecipitation (coIP) coupled with 
mass spectrometry (MS). In this approach, a 
protein bait is tagged with a molecular marker. 
Several types of tags are commercially available; 
each requires a distinct biochemical technique 
to recognize the tag and fish the bait protein out 
of the cell lysate, bringing with it any interacting 
proteins. These are then identified by MS. 

In addition to these empirical methods, 
researchers have used computational techniques 
to predict interactions on the basis of factors 
such as amino-acid sequence and structural 
information. “People ask “Why are you predict- 
ing interactions when you can just do the exper- 
iment?” says Gary Bader, a bioinformatician at 
the University of Toronto. “But experimental 
techniques fail for some proteins.” 


FALSE READINGS 

Every step of a procedure to detect protein— 
protein interactions — from the reagents used 
to the cell types and experimental conditions — 
influences the proteins that are identified. Two 
studies this year used similar methods to iden- 
tify interacting proteins in transcription factors 
in embryonic stem cells*“; there was incomplete 
overlap between the resulting data sets. “If you 
use the same protocol you will get reproducible 
lists of proteins. But different labs use different 
protocols, which affects the end result,’ says 
Raymond Poot, a cell biologist at Erasmus MC 
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hospital in Rotterdam, the Netherlands, and 
lead author of one of the studies. 

In his protocol, Poot pulled interacting pro- 
teins from cells using nuclear extracts express- 
ing different Flag-tagged transcription factors. 
He added a nuclease to his reactions to remove 
DNA and eliminate possible artefacts caused 
by proteins binding to it. “Transcription fac- 
tors bind to DNA so you are likely to pull out 
DNA-binding factors that are not directly inter- 
acting,” he explains. Purifying many different 
transcription factors with the same protocol 
also enabled the researchers to determine which 
interactions were most likely to be specific. For 
example, proteins that consistently co-purified 
with all transcription factors would be treated 
as unlikely to indicate a genuine interaction. 


Calling out false positives — reported inter- 
actions that don’t actually occur — and false 
negatives — interactions that do occur but are 
not picked up by the experimental protocol or 
are discarded — is one of the main challenges 
in the field. “Normally when you do a colP fol- 
lowed by MS you will get hundreds of protein 
candidates interacting with any one bait,” says 
Wade Harper, a cell biologist at Harvard Medi- 
cal School in Boston, Massachusetts. “When 
you weed out all the stochastic and non-specific 
interactions you end up with many fewer pro- 
teins. Some proteins in large complexes might 
have 30-50 partners, others only 4-5.” 

One way in which researchers increase the 
accuracy of their results is to use more than one 
method (for example, Y2H plus LUMIER) to 


Tools for the search 


~~ » 
i 


Methods such as the yeast two-hybrid system allow scientists to work out which proteins interact. 


The two main methods for finding 
protein-protein interactions are the 
yeast two-hybrid (Y2H) system and 
co-immunoprecipitation followed by mass 
spectrometry. Several companies sell 
reagents for both approaches. Invitrogen of 
Carlsbad, California, sells the ProQuest Two- 
Hybrid System with Gateway Technology. 
This is based on Y2H, with modifications 
to decrease false-positive results and allow 
rapid characterization, says the company. 
Other firms provide vectors used to produce 
proteins with affinity tags, which can easily 
be immunoprecipitated along with other 
interacting proteins. A polypeptide tag called 
Flag is popular among researchers, and 
Sigma Aldrich of St Louis, Missouri, provides 
several Flag-genes for purchase. Promega 
in Madison, Wisconsin, has the HaloTag 
technology, in which a protein of interest 
is expressed in fusion with a tag protein 
engineered from a bacterial enzyme. This 
tag can be used to purify the protein, and 
any interacting with it, by binding to a resin. 
The tag is cleaved off using a protease. 

For researchers who don’t have the time 
or infrastructure to do the experiments, 


companies such as Hybrigenics in Paris 
and Dualsystems Biotech of Schlieren, 
Switzerland, offer Y2H-based screening. 
“We have complex libraries with ten times 
more independent clones than most other 
libraries, which we screen to saturation. 
And rather than screening full-length 
proteins, we screen for interactions with 
domains,” says Etienne Formstecher, 
director of scientific projects and sales at 
Hybrigenics. “Full-length proteins can have 
some domains buried and not available to 
interact, at least in yeast where you may 
not have signals to unlock a closed protein 
conformation.” A customer is given a list 
of proteins that interact with the protein 
of interest; it indicates which domains are 
making contact and provides a confidence 
score for each interaction. 

Innoprot in Derio, Spain, provides 
an interaction service using tag-based 
purification designed for high-throughput 
analysis. And Invitrogen’s ProtoArray 
Protein-Protein Interaction Service uses 
microarrays containing more than 9,000 
human proteins to identify proteins that 
interact with any protein of interest. L.8. 
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detect the interactions. But the definition of a 
‘real interaction depends on the context. “Does 
areal interaction mean that two proteins inter- 
act if they are placed next to each other in a 
test tube, or that they must interact in a cell? Or 
does real mean that the interaction should have 
a biological function?” asks Ideker. Researchers 
can home in on functional interactions by com- 
bining data on interactions with other types of 
biological information, such as genetic interac- 
tions, protein localizations or gene expression. 
For instance, proteins whose genes are co- 
expressed are likely to interact with each other 
or to be part of the same complex or pathway. 
Many tools are available on the web for inte- 
grating different types of information about a 
given protein or gene. One is GeneMANIA, 
developed by Bader’s group in collaboration 
with Quaid Morris, a computational biolo- 
gist also at the University of Toronto. A user 
enters the gene names into GeneMANIA; 
the program provides a list of genes that are 
functionally similar or have shared properties, 
such as similar expression or localization, and 
then displays a proposed interaction network, 
showing relationships among the genes and 
the type of data used to gather that informa- 
tion. The user can click on any node to obtain 
information about the gene and on any link 
to obtain information about their relationship 
(such as citations for any published studies or 
other sources of data). “It’s like a Google for 
genetic and protein information,” says Bader. 
Other web-based interfaces that predict 
gene functions include STRING (http://string- 
db.org) developed at the European Molecular 
Biology Laboratory in Heidelberg, Germany. 
It hunts for protein interactions on the basis of 
genomic context, high-throughput experiments, 
co-expression and data from the literature. 


KEEPING SCORE 

To select real protein-protein interactions, 
Harper and some members of his lab, Matt Sowa 
and Eric Bennett, developed a software platform 
called CompPASS to assign confidence scores 
to an interaction detected by MS°. CompPASS 
takes data sets of interacting proteins (including 
those identified in experiments) and measures 
frequency, abundance and reproducibility of 
interactions to calculate the score. 

This year, Harper used CompPASS to iden- 
tify interactions among proteins involved in 
autophagy, the process by which cellular pro- 
teins and organelles are engulfed into vesicles 
and delivered to the lysosome to be degraded. 
Starting with 32 proteins known to have a role 
in autophagy, they identified 2,553 interacting 
proteins using co[P-MS. CompPASS then nar- 
rowed the list down to 409 high-confidence 
interacting proteins with 751 interactions’. 

Ideker’s group used a different approach 
to map interactions among human mitogen- 
activated protein kinases (MAPKs), which 
respond to external stimuli and regulate cell 
function. Having used Y2H to identify more 
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From a full network, researchers can zoom in on specific interactions that might be functionally relevant. 


than 2,000 interactions among known MAPKs, 
Ideker used evidence including conservation 
of interactions among different species to win- 
now that down to a core network of 641 high- 
confidence interactions’. 

For some of the proteins there was no pre- 
vious evidence of interactions with MAPKs. 
Ideker and his colleagues knocked down the 
expression of these proteins using RNA inter- 
ference, then looked for the effect of the knock- 
downs on proteins known to be activated by 
MAPKs. This allowed them to confirm that 
about one-third of their interactions had a role 
in MAPK signalling. 

These methods are helping to weed out false 
positives and provide associated confidence 
scores, but the problem of false negatives per- 
sists. “With these assays we try to get false posi- 
tives down to zero. The hit you take is on false 
negatives. So now you can be highly confident 
of your data but you are probably probing only 
about 20% of the interactome,’ says Ideker. “We 
would like to get every interaction but we do 
not get even close with current technologies.” 

New methods may become available to iden- 
tify interactions that escape detection by cur- 
rent techniques (see ‘Real-time analysis’). In 
the meantime, one way to address the problem 
is to combine procedures for detecting inter- 
actions, each sampling a different portion of 
the interactome. The interaction data obtained 
in an experiment can also be combined with 
that available in public databases, thus provid- 
ing a more complete picture, says Bader. 


FROM DATA TO NETWORKS 

Protein-protein interactions are only the raw 
material for networks. To build a network, 
researchers typically combine interaction data 
sets with other sources of data. Primary data- 
bases that contain protein-protein interactions 
include DIP (http://dip.doe-mbi.ucla.edu), 
BioGRID, IntAct (www.ebi.ac.uk/intact) and 
MINT (http://mint.bio.uniromaz2.it). These 
databases have committed to making records 


available through a common language called 
PSICQUIC, to maximize access. 

Other types of data that can be combined with 
protein-protein interactions include informa- 
tion on gene expression, cellular co-localization 
of proteins (based on microscopy), genetic 
information, metabolic and signalling pathways, 
and data from high-throughput assays. 

“One challenge computationally is integrat- 
ing heterogeneous data sets to build a network 
model, says Ilya Shmulevich, a professor at the 
Institute for Systems Biology in Seattle, Wash- 
ington. The second challenge is to decide ona 
modelling approach. “It will depend on what 
kind of data you have available and how you 
will be using the model,” says Shmulevich. 

Several bioinformatic tools have been devel- 
oped to model and represent networks. The 
most widely used ones are associated with 
Cytoscape (www.cytoscape.org), an open- 
source program for visualizing networks and 
for integrating them networks with other types 
of data. Several Cytoscape plug-ins allow users 
to download and explore databases. 

Commercial packages with similar functions 
include MetaCore from GeneGO in St Joseph, 
Michigan; Pathway Analysis from Ingenu- 
ity Systems in Redwood City, California; and 
Pathway Studio from Ariadne Genomics in 
Rockville, Maryland. These can access public 
sources of data as well as the company’s propri- 
etary databases. “One of the unique features of 
Pathway Studio is the openness of our system 
and the ability to integrate many different kinds 
of data,’ says David Denny, director of market- 
ing and product management at Ariadne. 


GUILT BY ASSOCIATION 

One reason for developing networks is to help 
assign functions to proteins through guilt by 
association. But “a huge slice of the proteome 
consists of proteins that no one knows what 
they do or interact with’, says Benjamin Cravatt, 
a chemical physiologist at the Scripps Research 
Institute in San Diego, California. 


For proteins not yet assigned to a portion 
of the human interaction network, Cravatt’s 
group developed a technology for assigning 
protein functions by exploiting an interac- 
tion between enzymes and chemical reagents 
dubbed activity-based probes. These probes 
consist of a reactive group that binds the active 
sites of many members of an enzyme family, 
and a reporter tag that is used for the detec- 
tion and identification of the probe-labelled 
enzymes’. 

Because these probes bind only to enzymes 
that are active, they can give insights into the 
enzymes functions. For example, ifa probe 
binds to a set of enzymes ina cancer cell but not 
in anormal cell, it means that these enzymes 
become more active in the cancer cell and so 
may have a role in cell growth. The activity 
probes can also serve as assays for the discov- 
ery of inhibitors for a particular enzyme, which 
may help researchers to understand the role 
of that enzyme. “You can develop an inhibitor 
for an enzyme before ever knowing what the 
actual substrate is,” says Cravatt. 

This year, he developed another strategy that 
not only determines differences in enzyme 
activities in different cells, but also pinpoints 
where in the protein these differences occur, 
providing a more quantitative measure of the 
differences’. The activities of many families 
of enzymes are regulated or fine-tuned by 
cysteine modifications. By looking specifically 
for changes in cysteine modifications across 
the proteome, he found 
‘hyper-reactive’ cysteine 
residues in several pro- 
teins of unknown func- 
tion, which suggests that 
they probably have roles 
in signalling pathways. 

One challenge in 
defining protein-protein 
interaction networks is 


“One : that interactions vary 
challenge 1s depending on the type 
integrating of cell and the cellular 
heterogeneous environment. For exam- 


data sets.” 
Ilya Shmulevich 


ple, Wrana mapped the 
protein-protein interac- 
tion network for TGF-B, 
a growth factor that regulates cell functions, 
and found that two proteins that pass on the 
signals from the factor inside the cell — Smad2 
and Smad4 — interact with one another only 
when the cells are stimulated with TGF-. If 
the cells are not stimulated, these two proteins 
don’t come into contact”. 

Bennett, Harper and Steven Gygi, a cell biol- 
ogist also at Harvard Medical School, devel- 
oped a proteomics platform centred around a 
technology called multiplex absolute quanti- 
fication (AQUA) to look at dynamic changes 
in protein interaction networks. AQUA uses 
synthetic peptides that contain stable isotopes 
as internal standards for the native peptides 
that are produced when proteins from a cell 
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lysate are digested. Using tandem 
MS, researchers can compare the lev- 
els of native and synthetic peptides 
in a cell to obtain a measure of the 
amount of native proteins present. 
Synthetic peptides can also be pre- 
pared with modifications, such as 
extra phosphate groups, to measure 
the number of post-translationally 
modified proteins. “We are pursuing 
the dynamics of protein networks by 
quantifying changes in the amount 
of proteins present in specific protein 
complexes,” says Harper. “Techniques 
such as AQUA provide an accurate 
and sensitive measure of how the 
stoichiometry of components within 
complexes that make up a network are altered 
in response to a stimulus.” 

The team used the approach to describe 
the rearrangements that occur in the protein 
network of cullin-RING ubiquitin ligases, 
enzymes that regulate protein turnover, under 
various cellular conditions”. 


DEVELOPMENTS IN DIAGNOSTICS 

Changes in protein-protein interaction 
networks may provide information about the 
mechanisms of disease. Last year, Wrana applied 
the network approach to the diagnosis of 
breast cancer. He used microarrays to measure 
genome-wide protein expression in the tumours 
of people with breast cancer, and then overlaid 
the expression data on the network diagram of 
the human interactome. 

Wrana had noted that ‘hub’ proteins, defined 
as those that interact with more than four 
others, can be grouped into two categories 
depending on whether they are expressed at 
the same time as the proteins with which they 
interact. When they looked at breast-cancer 
samples, Wrana and his colleagues found that 


Real-time analysis 


In November, Pacific Biosciences of Menlo 
Park, California, commercially released 

its third-generation DNA-sequencing 
platform, based on its single-molecule, 
real-time (SMRT) technology. A single DNA 
polymerase bound to a DNA template is 
attached to a tiny chamber illuminated 

by lasers, and nucleotides labelled with 
coloured fluorophores are introduced to 

it. As the polymerase incorporates them, 
each base is held for a few microseconds, 
while the fluorophore emits coloured light 
corresponding to the base identity. SMRT 
technology could also be used to analyse 
biomolecules other than DNA, and could 
become a common tool for detecting protein 
interactions, with some unique features. “This 
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Web tools such as GeneMANIA integrate data on a protein or gene. 


certain hub proteins were in a different category 
in breast-cancer patients with a good prognosis 
than in those with a poor prognosis. 

Thus, by overlaying the expression pattern 
ofa cancer cell from an individual patient onto 
the human interactome network, Wrana could 
predict a patient’s prognosis. “We found that 
the detection of global changes in network 
organization is more predictive of outcome 
than is gene expression alone,’ says Wrana. 
“We have now applied this method to other 
tumour models and obtained similar results.” 

KAYAK (kinase activity assay for protein 
profiling) is another approach to developing 
diagnostic tools for cancer on the basis of the 
functional consequences of the interaction 
between a protein, in this case a kinase, and 
its substrate. In this method, up to 90 peptide 
substrates for kinases are used to simulta- 
neously measure the addition of phosphate 
groups to proteins in a cell lysate — in essence 
providing a ‘phosphorylation signature’ for 
that particular cell. “The readout is so sensi- 
tive and so quantitative that even small differ- 
ences are teased out,’ says Gygi, who helped to 


technology can detect relatively 
weak interactions,’ says Jonas 
Korlach, a scientific fellow at 
Pacific Biosciences, adding that 
it could pick out interactions 
that happen so quickly that they 
can’t be identified by current 
methods. 

As a step towards such applications, 
Joseph Puglisi, a structural biologist at 
Stanford University School of Medicine in 
California, and his group, with scientists at 
Pacific Biosciences, observed transfer RNAs 
binding to single ribosomes in real time’’. In 
an unpublished follow-up, Puglisi’s group has 
used SMRT technology to watch interactions 
between transfer RNAs, ribosomes and 
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develop the method”. 

According to Gygi, the biggest appli- 
cation of KAYAK might be in tumour 
classification. “Biopsies or excised tis- 
sues can be profiled for kinase activities 
with pinpoint accuracy. These patterns 


- br ; could contribute towards personalized 


* drug treatments based on dysregulated 
» » kinase pathways,” he says. 

: The combination of different types 
of data and technologies should con- 
tinue to fill in the empty spaces of the 
current human interactome map. The 
picture may never be complete, but it 
will continue to provide insights into 
cellular mechanisms of health and 
disease. “I think that the network 
we have is dense enough for us to start doing 
studies to classify disease states,” says Wrana. 
“As the networks become better and coverage 
improves, the accuracy of diagnosis will also 
improve.” = 
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Laura Bonetta is a freelance science writer 
based in Garrett Park, Maryland. 
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Future SMRT systems could reveal interactions. 


protein factors to determine how the 
translation machinery synthesizes proteins. 
“We have just seen the tip of the iceberg in 
terms of applications,” says Korlach. 1.8. 
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hospital in Rotterdam, the Netherlands, and 
lead author of one of the studies. 

In his protocol, Poot pulled interacting pro- 
teins from cells using nuclear extracts express- 
ing different Flag-tagged transcription factors. 
He added a nuclease to his reactions to remove 
DNA and eliminate possible artefacts caused 
by proteins binding to it. “Transcription fac- 
tors bind to DNA so you are likely to pull out 
DNA-binding factors that are not directly inter- 
acting,” he explains. Purifying many different 
transcription factors with the same protocol 
also enabled the researchers to determine which 
interactions were most likely to be specific. For 
example, proteins that consistently co-purified 
with all transcription factors would be treated 
as unlikely to indicate a genuine interaction. 


Calling out false positives — reported inter- 
actions that don’t actually occur — and false 
negatives — interactions that do occur but are 
not picked up by the experimental protocol or 
are discarded — is one of the main challenges 
in the field. “Normally when you do a colP fol- 
lowed by MS you will get hundreds of protein 
candidates interacting with any one bait,” says 
Wade Harper, a cell biologist at Harvard Medi- 
cal School in Boston, Massachusetts. “When 
you weed out all the stochastic and non-specific 
interactions you end up with many fewer pro- 
teins. Some proteins in large complexes might 
have 30-50 partners, others only 4-5.” 

One way in which researchers increase the 
accuracy of their results is to use more than one 
method (for example, Y2H plus LUMIER) to 


Tools for the search 


~~ » 
i 


Methods such as the yeast two-hybrid system allow scientists to work out which proteins interact. 


The two main methods for finding 
protein-protein interactions are the 
yeast two-hybrid (Y2H) system and 
co-immunoprecipitation followed by mass 
spectrometry. Several companies sell 
reagents for both approaches. Invitrogen of 
Carlsbad, California, sells the ProQuest Two- 
Hybrid System with Gateway Technology. 
This is based on Y2H, with modifications 
to decrease false-positive results and allow 
rapid characterization, says the company. 
Other firms provide vectors used to produce 
proteins with affinity tags, which can easily 
be immunoprecipitated along with other 
interacting proteins. A polypeptide tag called 
Flag is popular among researchers, and 
Sigma Aldrich of St Louis, Missouri, provides 
several Flag-genes for purchase. Promega 
in Madison, Wisconsin, has the HaloTag 
technology, in which a protein of interest 
is expressed in fusion with a tag protein 
engineered from a bacterial enzyme. This 
tag can be used to purify the protein, and 
any interacting with it, by binding to a resin. 
The tag is cleaved off using a protease. 

For researchers who don’t have the time 
or infrastructure to do the experiments, 


companies such as Hybrigenics in Paris 
and Dualsystems Biotech of Schlieren, 
Switzerland, offer Y2H-based screening. 
“We have complex libraries with ten times 
more independent clones than most other 
libraries, which we screen to saturation. 
And rather than screening full-length 
proteins, we screen for interactions with 
domains,” says Etienne Formstecher, 
director of scientific projects and sales at 
Hybrigenics. “Full-length proteins can have 
some domains buried and not available to 
interact, at least in yeast where you may 
not have signals to unlock a closed protein 
conformation.” A customer is given a list 
of proteins that interact with the protein 
of interest; it indicates which domains are 
making contact and provides a confidence 
score for each interaction. 

Innoprot in Derio, Spain, provides 
an interaction service using tag-based 
purification designed for high-throughput 
analysis. And Invitrogen’s ProtoArray 
Protein-Protein Interaction Service uses 
microarrays containing more than 9,000 
human proteins to identify proteins that 
interact with any protein of interest. L.8. 
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detect the interactions. But the definition of a 
‘real interaction depends on the context. “Does 
areal interaction mean that two proteins inter- 
act if they are placed next to each other in a 
test tube, or that they must interact in a cell? Or 
does real mean that the interaction should have 
a biological function?” asks Ideker. Researchers 
can home in on functional interactions by com- 
bining data on interactions with other types of 
biological information, such as genetic interac- 
tions, protein localizations or gene expression. 
For instance, proteins whose genes are co- 
expressed are likely to interact with each other 
or to be part of the same complex or pathway. 
Many tools are available on the web for inte- 
grating different types of information about a 
given protein or gene. One is GeneMANIA, 
developed by Bader’s group in collaboration 
with Quaid Morris, a computational biolo- 
gist also at the University of Toronto. A user 
enters the gene names into GeneMANIA; 
the program provides a list of genes that are 
functionally similar or have shared properties, 
such as similar expression or localization, and 
then displays a proposed interaction network, 
showing relationships among the genes and 
the type of data used to gather that informa- 
tion. The user can click on any node to obtain 
information about the gene and on any link 
to obtain information about their relationship 
(such as citations for any published studies or 
other sources of data). “It’s like a Google for 
genetic and protein information,” says Bader. 
Other web-based interfaces that predict 
gene functions include STRING (http://string- 
db.org) developed at the European Molecular 
Biology Laboratory in Heidelberg, Germany. 
It hunts for protein interactions on the basis of 
genomic context, high-throughput experiments, 
co-expression and data from the literature. 


KEEPING SCORE 

To select real protein-protein interactions, 
Harper and some members of his lab, Matt Sowa 
and Eric Bennett, developed a software platform 
called CompPASS to assign confidence scores 
to an interaction detected by MS°. CompPASS 
takes data sets of interacting proteins (including 
those identified in experiments) and measures 
frequency, abundance and reproducibility of 
interactions to calculate the score. 

This year, Harper used CompPASS to iden- 
tify interactions among proteins involved in 
autophagy, the process by which cellular pro- 
teins and organelles are engulfed into vesicles 
and delivered to the lysosome to be degraded. 
Starting with 32 proteins known to have a role 
in autophagy, they identified 2,553 interacting 
proteins using co[P-MS. CompPASS then nar- 
rowed the list down to 409 high-confidence 
interacting proteins with 751 interactions’. 

Ideker’s group used a different approach 
to map interactions among human mitogen- 
activated protein kinases (MAPKs), which 
respond to external stimuli and regulate cell 
function. Having used Y2H to identify more 
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lysate are digested. Using tandem 
MS, researchers can compare the lev- 
els of native and synthetic peptides 
in a cell to obtain a measure of the 
amount of native proteins present. 
Synthetic peptides can also be pre- 
pared with modifications, such as 
extra phosphate groups, to measure 
the number of post-translationally 
modified proteins. “We are pursuing 
the dynamics of protein networks by 
quantifying changes in the amount 
of proteins present in specific protein 
complexes,” says Harper. “Techniques 
such as AQUA provide an accurate 
and sensitive measure of how the 
stoichiometry of components within 
complexes that make up a network are altered 
in response to a stimulus.” 

The team used the approach to describe 
the rearrangements that occur in the protein 
network of cullin-RING ubiquitin ligases, 
enzymes that regulate protein turnover, under 
various cellular conditions”. 


DEVELOPMENTS IN DIAGNOSTICS 

Changes in protein-protein interaction 
networks may provide information about the 
mechanisms of disease. Last year, Wrana applied 
the network approach to the diagnosis of 
breast cancer. He used microarrays to measure 
genome-wide protein expression in the tumours 
of people with breast cancer, and then overlaid 
the expression data on the network diagram of 
the human interactome. 

Wrana had noted that ‘hub’ proteins, defined 
as those that interact with more than four 
others, can be grouped into two categories 
depending on whether they are expressed at 
the same time as the proteins with which they 
interact. When they looked at breast-cancer 
samples, Wrana and his colleagues found that 


Real-time analysis 


In November, Pacific Biosciences of Menlo 
Park, California, commercially released 

its third-generation DNA-sequencing 
platform, based on its single-molecule, 
real-time (SMRT) technology. A single DNA 
polymerase bound to a DNA template is 
attached to a tiny chamber illuminated 

by lasers, and nucleotides labelled with 
coloured fluorophores are introduced to 

it. As the polymerase incorporates them, 
each base is held for a few microseconds, 
while the fluorophore emits coloured light 
corresponding to the base identity. SMRT 
technology could also be used to analyse 
biomolecules other than DNA, and could 
become a common tool for detecting protein 
interactions, with some unique features. “This 
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certain hub proteins were in a different category 
in breast-cancer patients with a good prognosis 
than in those with a poor prognosis. 

Thus, by overlaying the expression pattern 
ofa cancer cell from an individual patient onto 
the human interactome network, Wrana could 
predict a patient’s prognosis. “We found that 
the detection of global changes in network 
organization is more predictive of outcome 
than is gene expression alone,’ says Wrana. 
“We have now applied this method to other 
tumour models and obtained similar results.” 

KAYAK (kinase activity assay for protein 
profiling) is another approach to developing 
diagnostic tools for cancer on the basis of the 
functional consequences of the interaction 
between a protein, in this case a kinase, and 
its substrate. In this method, up to 90 peptide 
substrates for kinases are used to simulta- 
neously measure the addition of phosphate 
groups to proteins in a cell lysate — in essence 
providing a ‘phosphorylation signature’ for 
that particular cell. “The readout is so sensi- 
tive and so quantitative that even small differ- 
ences are teased out,’ says Gygi, who helped to 


technology can detect relatively 
weak interactions,’ says Jonas 
Korlach, a scientific fellow at 
Pacific Biosciences, adding that 
it could pick out interactions 
that happen so quickly that they 
can’t be identified by current 
methods. 

As a step towards such applications, 
Joseph Puglisi, a structural biologist at 
Stanford University School of Medicine in 
California, and his group, with scientists at 
Pacific Biosciences, observed transfer RNAs 
binding to single ribosomes in real time’’. In 
an unpublished follow-up, Puglisi’s group has 
used SMRT technology to watch interactions 
between transfer RNAs, ribosomes and 
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develop the method”. 

According to Gygi, the biggest appli- 
cation of KAYAK might be in tumour 
classification. “Biopsies or excised tis- 
sues can be profiled for kinase activities 
with pinpoint accuracy. These patterns 


- br ; could contribute towards personalized 


* drug treatments based on dysregulated 
» » kinase pathways,” he says. 

: The combination of different types 
of data and technologies should con- 
tinue to fill in the empty spaces of the 
current human interactome map. The 
picture may never be complete, but it 
will continue to provide insights into 
cellular mechanisms of health and 
disease. “I think that the network 
we have is dense enough for us to start doing 
studies to classify disease states,” says Wrana. 
“As the networks become better and coverage 
improves, the accuracy of diagnosis will also 
improve.” = 
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Future SMRT systems could reveal interactions. 


protein factors to determine how the 
translation machinery synthesizes proteins. 
“We have just seen the tip of the iceberg in 
terms of applications,” says Korlach. 1.8. 


PACIFIC BIOSCIENCES 


ANTIBODY COMPANIES 
Company 

AbD Serotec 

Abgent 

Active Motif 

AdiMab 

AnaSpec 

BioGenes 

Charles River 
Epitomics 

Fitzgerald Industries 
GeneTel 

GENOVAC 

GenWay Biotech 
GTC Biotherapeutics 
Harlan 

Innovative Research 
Lonza Biologics 
Mabtech 


Maine Biotechnology 
Services 


Millipore 

MorphoSys 

New England Peptide 
Novus Biologicals 
OriGene 

ProSci 

SDIX 


SouthernBiotech 


SERVICES 

Company 

Biomol 

Dualsystems Biotech 
Hybrigenics 


Innoprot 


Invitrogen 


Products/Activity 

Sources for research and diagnostic antibodies 

Antibodies; customized antibodies; protein-expression services 
Antibodies and reagents for transcriptional regulation and epigenetics 
Human antibodies produced in yeast 

Off-the-shelf and customized antibodies and peptides 

Customized antibodies; immunoassays; peptides 

Customized antibodies 

Customized and off-the-shelf antibodies 

Antibody distributors 

Antibodies for proteomics; chicken-antibody specialists 

Antibodies against G-protein-coupled receptors; genetic immunization 
Primary and secondary antibodies (produced in chickens) 
Human-antibody production in milk from transgenic animals 
Customized antibodies, peptide synthesis, hybridoma development 
Customized and off-the-shelf antibodies 

Therapeutic antibodies and recombinant proteins 

Antibodies for ELISA and ELISA Spot applications; cytokines 


Hybridoma, polyclonal and monoclonal antibody development, 
ascites and in vitro production 


Antibodies; secondary reagents, probes, tests and kits 

Human antibodies; HuCAL and HuCAL gold antibodies 

Customized and off-the-shelf antibodies; peptide arrays 

Customized and off-the-shelf antibodies 

Proteins and customized antibodies; collaborative antibody production 
Off-the-shelf and customized antibodies; immunoassays 

Customized and off-the-shelf antibodies 


Catalogue and customized antibodies and reagents 


Products/Activity 

Services for chemical synthesis, cell culture and antibody production 
Yeast two-hybrid screening service and products 

Protein-interaction services based on yeast two-hybrid system 
Protein-interaction service using affinity-tag-based purification 


Protein-interaction services based on proteome arrays 


SOFTWARE ANALYSIS TOOLS 


Company 
Ariadne 

Array Genetics 
BIOBASE 
BioCyc 
BioDiscovery 
Bitplane 

Bruker Daltonics 
Ceiba Solutions 
Cyberell 


Dalicon 


ePitope Informatics 


Products/Activity 

Pathway analysis 

Protein database; genomics, proteomics and microarray analysis 
Biological databases; analysis tools for gene expression 

Pathway analysis software plus collection of databases 
Microarray analysis and discovery software 

Image-analysis software; filament tracers and auto-aligners 
Instruments, software and consumables for mass spectrometry 
Gene-expression data-analysis system 

Bioinformatics software and services 

Bioinformatics software for managing and analysing large-scale data 


Epitope predictions; protein-analysis software 


PROTEIN-PROTEIN INTERACTIONS 


TECHNOLOGY 


Location 

Kidlington, UK 

San Diego, California 
Carlsbad, California 
Lebanon, New Hampshire 
Fremont, California 

Berlin, Germany 
Wilmington, Massachusetts 
Burlingame, California 
Acton, Massachusetts 
Madison, Wisconsin 
Freiburg, Germany 

San Diego, California 
Framingham, Massachusetts 
Indianapolis, Indiana 

Novi, Michigan 

Basel, Switzerland 

Nacka Strand, Sweden 
Portland, Maine 


Billerica, Massachusetts 
Martinsried, Germany 
Gardner, Massachusetts 
Littleton, Colorado 
Rockville, Maryland 
Poway, California 
Newark, Delaware 


Birmingham, Alabama 


Location 

Hamburg, Germany 
Schlieren, Switzerland 
Paris, France 

Bizkaia, Spain 


Carlsbad, California 


Location 

Rockville, Maryland 
Newtown, Connecticut 
Wolfenbittel, Germany 
Menlo Park, California 

El Segundo, California 
Zurich, Switzerland 
Billerica, Massachusetts 
Cambridge, Massachusetts 
Helsinki, Finland 
Nijmegen, the Netherlands 
Hexham, UK 
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URL 

www.ab-direct.com 
www.abgent.com 
www.activemotif.com 
www.adimab.com 
www.anaspec.com 
www.biogenes.de 
www.criver.com 
www.epitomics.com 
www.fitzgerald-fii.com 
www.genetel-lab.com 
www.genovac.com 
www.genwaybio.com 
www.gtc-bio.com 
www.harlan.com 
www.innov-research.com 
www.lonzabiologics.com 
www.mabtech.com 


www.mainebiotechnology.com 


www.millipore.com e 
www.morphosys.com 
www.newenglandpeptide.com 
www.novusbio.com 
www.origene.com e 
www.prosci-inc.com 


antibodies.sdix.com 


www.southernbiotech.com 


URL 

www.biomol.de 
www.dualsystems.com 
www.hybrigenics.com 
www.innoprot.com 


www.invitrogen.com 


URL 
www.ariadnegenomics.com 
www.arraygenetics.com 
www.biobase-international.com 
www.biocyc.com 
www.biodiscovery.com 
www.bitplane.com e 
www.bdal.com 
www.ceibasolutions.com 
www.cyberell.com 
www.dalicon.com 


www.epitope-informatics.com 


VOL 468 | NATURE | 855 


U8 OREN PROTEIN-PROTEIN INTERACTIONS 


SOFTWARE ANALYSIS TOOLS 


Company 
Genedata 
GeneGO 
Genomatix 
Geospiza 

IBM Life Sciences 
Ingenuity Systems 
LGC Genomics 
Life Technologies 
LifeSpan BioSciences 
Metaphorics 
MiraiBio 

Parc Research 
Partek 


Premier Biosoft 


Products/Activity 

Bioinformatics systems and services for sequence and genome analysis 
Data mining and analysis solutions in systems biology 

Software for studying molecular mechanisms of gene regulation 
Bioinformatics software; tools for sequence assembly and analysis 
Database integration; data- management systems 

nformation solutions and custom service for life-sciences research 
Analysis, annotation and visualization of full chromosome sequences 
Maps of interconnected biological signalling and metabolic pathways 
Database for drug-target discovery; antibodies and custom antibodies 
Database of protein-ligand structural information 

Bioinformatics solutions 


Software for the analysis of glycan mass spectrometry data sets 


Pattern recognition and interactive visualization software 


Sequence analysis, primer design and two-hybrid protein interactions 


GENERAL MOLECULAR BIOLOGY REAGENTS 


Company 

Agilent Technologies 
Applied Biosystems 
Attagene 

BD Biosciences 
Bio-Rad 

BMG Labtech 
Cambrex 
Cole-Parmer 

EMD Biosciences 


Enzo Life Sciences 


Horiba 

Invitrogen 

Irvine Scientific 

Lonza 

Metrohm 

Merck 

MP Biomedicals 

New England BioLabs 
Pacific Biosciences 
PerkinElmer 
Princeton Separations 
Promega 
Sigma-Aldrich 


Takara Bio 

Thermo Scientific 
Tocris Bioscience 
USB 

Wako Chemicals USA 


e see advertisement 


Products/Activity 

Reagents and instruments for life-sciences research 
Reagents for molecular- and cell-biology research 
Transcription-factor profiling system; software 

Research reagents, bioimaging systems, instrumentations 
Products, instruments and software for life-sciences research 


Microplate and array readers and handling systems 


Products for molecular- and cell-biology research 
Variety of instruments and reagents 
Calbiochem, Novabiochem and Novagen product lines 


Consumables and assays for molecular biology, gene expression and 
genomic analysis 


Spectroscopy systems and accessories 

Reagents for cell and molecular biology 

Defined media for cell-culture applications; custom media services 
Molecular-biology reagents and systems; advanced chemical synthesis 
Laboratory instruments; consumables 

Chemicals, kits and reagents 

Reagents and chemicals for research 

Molecular-biology-related reagents, kits and enzymes 

Platform for single molecule, real-time detection of biological events 
nstruments, reagents and kits for life sciences 

DNA purification columns and reagents, fluorescent protein labelling kits 
Chemicals for mass spectrometry 


Reagents for chemistry and molecular-biology research, including 
protein arrays 


Reagents, kits and consumables for molecular biology 


nstruments and reagents for life-sciences research 
Chemicals for life-sciences research; contract research services 
Chemicals and reagents for molecular biology 


Speciality chemicals; clinical diagnostic reagents 
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Location 

Basel, Switzerland 

St Joseph, Missouri 
Munich, Germany 
Seattle, Washington 
Armonk, New York 
Redwood City, California 
Berlin, Germany 
Carlsbad, California 
Seattle, Washington 
Aliso Viejo, California 
South San Francisco, California 
Palo Alto, California 

St Louis, Missouri 


Palo Alto, California 


Location 

Santa Clara, California 
Carlsbad, California 
Morrisville, North Carolina 
Franklin Lakes, New Jersey 
Hercules, California 
Offenburg, Germany 

East Rutherford, New Jersey 
Vernon Hills, Illinois 

San Diego, California 

New York, New York 


Kyoto, Japan 

Carlsbad, California 
Santa Ana, California 
Basel, Switzerland 
Riverview, Florida 
Darmstadt, Germany 
Solon, Ohio 

Ipswich, Massachusetts 
Menlo Park, California 
Waltham, Massachusetts 
Adelphia, New Jersey 
Madison, Wisconsin 


St Louis, Missouri 


Shiga, Japan 

Waltham, Massachusetts 
Bristol, UK 

Cleveland, Ohio 


Richmond, Virginia 


URL 
www.genedata.com 
www.genego.com 
www.genomatix.de 
www.geospiza.com 
www.ibm.com 
www.ingenuity.com 
www.lgcgenomics.com 
www.lifetechnologies.com 
www.l|sbio.com 
www.metaphorics.com 
www.miraibio.com 
www.parc.com 
www.partek.com 


www.premierbiosoft.com 


URL 


www.agilent.com 


www.appliedbiosystems.com 


www.attagene.com 
www.bd.com 
www.bio-rad.com 
www.bmeglabtech.com 
www.cambrex.com 
www.coleparmer.com 
www.emdbiosciences.com 


www.enzo.com 


www.horiba.com 
www.invitrogen.com 
www.irvinesci.com 
www.lonza.com 
www.metrohmusa.com 
www.merck.de 
www.mpbio.com 


www.neb.com 


www.pacificbiosciences.com 


las.perkinelmer.com 
www.prinsep.com 
www.promega.com 


www.sigmaaldrich.com 


www.takara-bio.com 
www.thermo.com 
www.tocris.com 
www.usbweb.com 


www.wakousa.com 


CAREERS 


TURNING POINT Biophysicist explains how 
going overseas enriched his career p.859 


IRELAND Biomedical institute set to recruit 
for diagnostic research p.859 


NATUREJOBS For the latest career 
listings and advice www.naturejobs.com 


PUBLIC HEALTH 


Food-safety 


sentinels 


Disease outbreaks in recent years have revealed the 
vulnerability of food supplies. But they offer opportunities 
for those interested in waging war on microbes. 


BY LAURA CASSIDAY 


nasty strain of the bacterium Escherichia 
coli, known as O157:H7, infected at least 
74 people across 32 US states. They had pain- 
ful abdominal cramps and diarrhoea, and 
about half of those affected were ill enough to 
need hospitalization. Initial attempts to trace 
the outbreak to a particular food product left 
epidemiologist Karen Neil perplexed. 
“Normally, E. coli 0157 outbreaks are asso- 
ciated with ground beef or leafy greens like 


E the summer of 2009, a particularly 


spinach or lettuce,” says Neil, who works at the 
US Centers for Disease Control and Preven- 
tion (CDC) in Atlanta, Georgia. But according 
to questionnaires filled out by the patients, not 
everyone had eaten the usual suspects in the 
days before their illness. So Neil conducted in- 
depth interviews with five patients, and revealed 
acommon factor: all had eaten the same brand 
of refrigerated cookie dough. “This was the first 
time that raw cookie dough was associated with 
an E. coliO157 outbreak,” says Neil. 

Such detective work is common for those 
charged with safeguarding the food supply 


from biological and chemical contaminants. 
At each step along the chain from farm to table, 
researchers are needed to quickly detect and 
respond to problems when they arise. Scientists 
such as Neil work on the front lines of outbreak 
response, whereas others focus on prevention 
or detection strategies or on understanding 
how food-borne pathogens make people sick. 
Employers seek expertise in areas ranging 
from public health and epidemiology to micro- 
biology and behavioural sciences. Although 
federal funding for food-safety research in the 
United States has remained mostly flat since a 
spike after the terrorist attacks on 11 Septem- 
ber 2001, the breadth of the field means that 
there is a consistent demand for researchers in 
government, academia and industry. 


GOOD GROWTH 

As food imports from around the globe 
increase, more personnel will be needed 
to ensure their safety — and government 
scientists are often the first line of defence. 
Inspectors employed by the US Department 
of Agriculture (USDA), the Food and Drug 
Administration (FDA) or private companies 
visit factories, slaughterhouses, restaurants and 
ports. There they examine sanitary conditions 
and manufacturing processes, ensure proper 
food labelling and collect samples for labora- 
tory testing. The USDA, for example, employs 
more than 7,500 food inspectors nationwide, 
with entry-level positions requiring a bach- 
elor’s degree or one year of job-related experi- 
ence in the food industry. The FDAs Center for 
Food Safety and Applied Nutrition (CFSAN) 
in College Park, Maryland, shares monitoring 
duties with the USDA. The CDC works with 
both agencies, as well as with state govern- 
ments, to investigate outbreaks of infectious 
disease. 

Detecting and identifying food-borne path- 
ogens is a slow process. Inspectors must swab 
a food sample onto a Petri dish and culture it 
for at least six hours before laboratory analy- 
sis. According to Steven Musser, director of the 
Office of Regulatory Science at the CFSAN, his 
office has a high demand for scientists with 
experience in developing tools to detect food 
contaminants, especially portable devices that 
work more quickly. “It’s impossible to screen all 
of the material,” he says. “But the more tools we 
can put in the hands of inspectors in the field, 
the more products we can look at. We're look- 
ing to hire bioinformaticians who know how to 
handle very large data sets, because right now 
we can generate data much faster than we > 
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> cananalyse them.” 

Researchers also 
= need tools to trace 
outbreaks to their 
source quickly. In the 
United States, physi- 
cians report suspected 
food-borne illnesses 
to state public-health 
departments, where 
scientists culture 
microorganisms from 
patients’ stool samples 
and analyse pathogen 
DNA with pulsed- 
field gel electro- 
phoresis. They report 
genetic signatures of 


a 
oi 
n 
4 
oe 
ira 


“Right now we 
can generate 
data much faster 
than we can 


analyse them.” 
Steven Musser 


detected pathogens 

to a CDC database that links the PulseNet 
national network of public-health labs and is 
used to identify clusters of outbreaks that may 
be linked to a common source. 

Closely related strains of some microorgan- 
isms, such as Salmonella, can't be differentiated 
using pulsed-field gel electrophoresis. So the 
CFSAN Office of Regulatory Science employs 
about 130 scientists who work closely with bio- 
tech companies to develop techniques for fin- 
gerprinting food-borne pathogens, including 
whole-genome sequencing, proteomic analysis 
and mass spectrometry. “Data analysis is one of 
the big bottlenecks of this work,” says Musser. 

Also in demand are researchers with exper- 
tise in analytical instrument development, food 
microbiology and mass spectrometry. Musser 
says that the number of new research positions 
available in 2011 will depend on the level of 
FDA funding in the federal budget. “We typi- 
cally dont start the hiring process until Con- 
gress passes the budget in the springtime or 
summertime,” he says. 


PUBLIC CONCERN 

Food-safety research at the FDA will get a 
boost if Congress and President Barack Obama 
approve the Food Safety Modernization Bill, 
which the US House of Representatives passed 
in 2009. The Senate passed a similar bill on 30 
November 2010. If lawmakers can work out 
discrepancies, the legislation could be signed 
into law later this year. The Senate version of 
the bill calls for increased surveillance and 
testing of foods by the FDA and includes pro- 
visions for at least 4,000 new field staff at the 
CFSAN and the FDA Center for Veterinary 
Medicine in Rockville, Maryland, in 2011, with 
further yearly increases until 2014. The bill also 
appropriates US$825 million for food-safety 
work at these centres in the first fiscal year. 

In general, funding for food-safety research 
waxes and wanes with public concern over dis- 
ease outbreaks. In the 1990s, ‘mad cow disease’ 
galvanized food-safety research in Europe. At 
the same time, US federal funding for food 
safety increased by more than 60%, spurred 


bya deadly 1993 outbreak of E. coli at the fast- 
food restaurant chain Jack in the Box. With 
the terrorist attacks of 11 September 2001, 
funding priorities shifted to food security. 
More recently, much-publicized outbreaks of 
Salmonella in peanut products and eggs seem 
to have refocused public attention on food 
safety, as reflected by the pending legislation. 

But government support is not enough. 
Collaborations among government, academia 
and industry are essential to keep food safe. In 
Europe, scientists at universities in the Neth- 
erlands collaborate with industry researchers 
through the Top Institute of Food and Nutri- 
tion based in Wageningen, a public-private 
partnership funded by the Dutch government 
that fosters industrially relevant innovations 
in food quality and safety. In January 2011, the 
Top Institute will announce vacancies for grad- 
uate students and postdocs in 20 new research 
projects at member institutes, says Marcel 
Zwietering, a professor of food microbiol- 
ogy at Wageningen University, a participant 
in the partnership that has an internation- 
ally renowned programme in food-safety 
research. 

Basic-science researchers often interact 
with industry and government. Michael Peck, 
programme leader at the Institute of Food 
Research in Norwich, UK, says that much of his 
time is spent working with industry and food- 
safety regulators to apply his research findings 
on the physiology and molecular biology of 
Clostridium botuli- 
num, the food-borne 
pathogen respon- 
sible for botulism. 
Similarly, researchers 
at the University of 
Georgia Center for 
Food Safety in Grif- 
fin collaborate closely 
with scientists at the 
nearby CDC. “After 


my lab developed the 

frst test for detecting “State health 

E. coli O157 in food, departments 

the CDC began send- have gotten good 
ing us food samples at detecting 
associated with out- harmful 

breaks,’ says Michael microbes in 
Doyle, director of the food. sd 

University of Geor- Michael Doyle 


gia centre. “Now that 

some of the state health departments have 
gotten good at detecting harmful microbes 
in food, they usually only send us samples in 
which the organisms are really hard to isolate” 
Frequent collaborations with other sectors 
mean that people skills and the ability to work 
well in a team are valuable assets for those in the 
food-safety field, notes Neil. 

Food-safety researchers in industry have 
another motivation: company profits. A food 
recall can cost a company tens of millions of 
dollars in lost profits and liability lawsuits and 
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even force them out of business, as demon- 
strated by the 2008-09 Salmonella outbreak 
at the now-defunct Peanut Corporation of 
America. “Safety will always be an important 
thing for food companies to guarantee, because 
one big problem can ruin 100 or more years of 
existence of a brand,” says Zwietering. 


DODGING DISASTER 

To avoid such scenarios, industry scientists 
conduct routine testing of products, perform 
environmental monitoring at factories, and 
work to develop more efficient detection 
methodologies. Technicians may not need 
even a bachelor’s degree for entry, whereas 
team leaders typically have PhDs in areas 
such as food science, chemistry, microbiol- 
ogy or chemical engineering. “Many of our 
former students are now in food-safety posi- 
tions with major food companies,’ says Shaun 
Kennedy, director of the National Center for 
Food Protection and Defense at the Univer- 
sity of Minnesota in St Paul. Zwietering notes 
that in the Netherlands, although the number 
of students interested in food and agricul- 
ture has dropped, the demand from industry 
remains unchanged. 

Food companies typically have more 
resources than universities for training 
researchers in specific skills, and provide 
ample opportunities for career advancement. 
Scott Hood, senior manger for microbiology 
and thermal processing at General Mills in 
Minneapolis, Minnesota, says that at a large 
global food company such as his, it is com- 
mon to spenda couple of years in the lab, then 
maybe a year or two ata plant seeing the daily 
challenges of the production environment, 
before coming back to headquarters to work 
on food safety and quality at the corporate 
level. 

But when every food-industry executive's 
worst nightmare — a suspected outbreak 
of food poisoning — does occur, scientists 
from all sectors work together. The com- 
pany’s food scientists work overtime with 
government agencies to trace the source of 
the outbreak to a specific plant and produc- 
tion run. 

Like many food-safety researchers, Neil 
savours the opportunity to make an impact. 
“It's avery important and exciting field to be in 
right now,’ she says. Before joining the staff at 
the CDC in June 2010, Neil was a postdoctoral 
fellow in the CDC’s Epidemic Intelligence 
Service, a two-year training programme for 
health professionals interested in applied epi- 
demiology. She conducted field investigations 
in Rhode Island, Missouri and Uganda. 

“Sometimes you don’t know in the morning 
that you'll be leaving for an outbreak inves- 
tigation that afternoon,” she says, “which 
makes the job even more interesting.” m 


Laura Cassiday is a freelance writer based in 
Hudson, Colorado. 


C. KALODIMOS 


TURNING POINT 


Charalampos Kalodimos 


A biophysicist at Rutgers University in 
Piscataway, New Jersey, Greek-born 
Charalampos Kalodimos is the first person to 
win two key young-investigator awards: the 
Biophysical Society’ 2011 Michael and Kate 
Béardny Award in September, and the Protein 
Societys award last year. 


Are you a natural-born scientist or a convert? 
I didn’t know I wanted to be a scientist until 
my first year of graduate studies at the Curie 
Institute in Paris. There, for my PhD, I began 
working in bioinorganic chemistry, model- 
ling the binding sites of haemoglobin and 
myoglobin. I became mesmerized by the 
notion that, as a scientist, you can be the first 
person to discover something. But to do so, 
you must excel at two things: finding inter- 
esting questions, and strategically designing 
approaches to address those questions. 


What was your crucial early-career decision? 
As a postdoc, I decided to join Robert 
Kaptein’s lab at Utrecht University in the 
Netherlands. He was one of the pioneers of 
structural biology, helping to develop the 
tools we use to determine the three-dimen- 
sional structure of biomolecules. Joining his 
lab was so important because I was intro- 
duced to biomolecular nuclear magnetic 
resonance (NMR). Using NMR, we worked 
to resolve interesting biological phenomena, 
such as protein-DNA interactions and DNA 
regulation. It was an exciting environment 
because I had the freedom to pursue any 
kind of question. I learned that what I really 
like is figuring out the structures controlling 
how molecular systems work. We tend to use 
NMR imaging because it is so powerful. 


Has your career benefited from tackling 
several research projects at once? 

Iam always conducting exploratory research 
and writing grants to get the next project 
funded. Sometimes it can take 3-4 years to 
get enough preliminary data to get the fund- 
ing. Although some mentors cautioned me 
that this strategy might be too risky, it worked 
for me. I think it’s just as risky if researchers 
focus all their resources on getting one project 
to work. What was the worst thing that could 
happen? I hada pretty nice back-up plan — I 
would go back to Greece and live by the sea. 


Why the United States for your first tenure- 
track position? 

Although the scientific infrastructure is 
good in Europe, junior faculty members are 


not completely independent at most places. 
You start off with small budgets. What I like 
here is that you get more freedom early in 
your career and get a good start-up package. 
If you fail, there is no one else to blame. If 
you do great, you get all the praise. It’s very 
competitive, but I like that. Everything is fair. 
The only thing that matters is ifyou do good 
science. This is the place to be at a young 
faculty level. I travel to Europe a lot, and I 
advise young researchers to move to the 
United States to go after their dream. 


Are the awards a career turning point? 

It’s great to have your peers recognize and 
appreciate the work you have done. What 
motivates me the most is knowing I was 
nominated for the Barany award by Lewis 
Kay at the University of Toronto, Canada, 
and Ad Bax at the US National Institutes of 
Health, who pioneered NMR tools. In that 
sense, these awards have only pushed me to 
work harder as we spend the next five to ten 
years continuing to investigate large protein 
complexes, which are still quite challeng- 
ing — their size and complexity make them 
difficult to purify. 


Have you been tempted by any offers to go 
elsewhere since winning these two awards? 
I'm happy at Rutgers, but if there is any chance 
that I can be more productive elsewhere in 
the United States or Europe, then I would 
certainly consider that. Never say never. 


Do you thrive on competition? 

Absolutely, 100%. As a scientist investigating 
biological questions, I invest a lot of time and 
energy in my work. Knowing that someone 
else might get to the answer first provides extra 
motivation to keep pushing harder to get there 
first. Competitiveness is absolutely required in 
science. I thrive in that environment. m 
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GRANTS 
British funding change 


Some UK research funders are opting to 
give larger and longer-term grants to fewer 
awardees than before. The government- 
funded Engineering and Physical 

Sciences Research Council in Swindon 

is restructuring its grants on the basis 

of research showing that larger, longer 
grants result in a higher publication and 
citation rate, says spokeswoman Victoria 
McGuire. And the Science and Technology 
Facilities Council, also in Swindon, has 
merged two ofits grant mechanisms into 

a single scheme to provide better long- 
term support, says spokeswoman Julia 
Maddock. The Wellcome Trust, a charitable 
foundation based in London, has created 
two types of larger grants of up to £425,000 
(US$661,000) a year for up to seven years, 
which will have fewer recipients than its 
traditional schemes. 


IRELAND 
Biomedical recruitment 


The Biomedical Diagnostics Institute 
(BDI), an academic-industrial partnership 
based at Dublin City University, has 
received a five-year, €19-million 
(US$25-million) grant that will allow 

it to recruit up to 40 postdocs and PhD 
students in surface chemistry, photonics 
and microfluidics. This is the BDI’s second 
round of funding from Science Foundation 
Ireland (SFI), a government funding body 
hit hard by budget cuts this year, and it 

will support research and development of 
prototype diagnostic devices. BDI director 
Michael Berndt says that the institute is 
diversifying its projects and collaborations 
to secure outside funding because further 
SFI support is uncertain. 


EUROPEAN UNION 


Youth fires innovation 


More funding and job openings could 
arise for early-career scientists in the 
European Union if an EU council's 
recommendations are taken up. The 
Competitiveness Council — which 
reviews EU economic affairs, industry 
and scientific research — concluded 

at a 26 November meeting that young 
researchers help to stimulate innovation 
and create a science-based culture, and 
urged the EU to find ways to attract and 
retain them. It said researcher mobility is 
important and must be ensured through 
retention of pension rights and other 
benefits. In February 2011, EU nation 
leaders will discuss economic reform. 


CEMBER 2010 | VOL 468 | NATURE | 859 


THE CLEVEREST MAN IN THE WORLD 


BY TONY BALLANTYNE 


4 Cc i, this is Clark Maxwell, the 
H cleverest man in the world. Ten 
seconds, €10,000. Off you go!” 

“Clark! My name's Bob. My parachute’s 
broken! What should I do?” 

“Hi Bob. Let me see. GPS has you at 
20,000 feet over Arizona. That’s pretty high 
up! Given a terminal velocity of 180 feet per 
second, you've just under two minutes before 
you hit the ground.” 

“I know! What do I do?” 

“That’s a tough one! Give me a minute to 
think...” 

“What? No! Don't hang...” 

Too late. Clark checked the volume of 
space around Bob on his computer and 
switched to the next call in the queue. 

“Hi, this is Clark Maxwell, the cleverest 
man in the world. Ten seconds, €10,000. Hit 
me!” 

“This is James Sunderland, chief execu- 
tive of eToys. Clark, we've got a spy in the 
company. Every new product we develop, 
our competitors get to market weeks before 
we do.” 

“Spies aren't your only problem then, you 
must be very inefficient in terms of product 
manufacture.” 

“Oh. What should we do?” 

“That’s two questions, James. Just give me 
a second...” 

Clark called up eToys ona second monitor. 
Keeping one eye on Bobs rapid descent, he ran 
a number of searches in quick succession. 

“James! You'd have had the answer your- 
self if youd taken the trouble to check your 
network audit trails. The plans are being 
deliberately downloaded onto games car- 
tridges as part of the background scenery. 
Your competitors are buying your secrets 
wholesale. Now for your second question, 
may I suggest that you make an appointment 
with my PA to discuss looking at your com- 
pany from top to bottom.” 

“Uh, sure. Thanks, Clark.” 

“Dont mention it. Bob! How’s it going?” 

“Still falling, Clark” 

“I see that. Bob, I want you to look down. 
Do you see the big lake?” 

“Yes. Should I aim for it?” 

“No! But don't you find it beautiful? Calm- 
ing even?” 


“No. Should I?” > NATURE.COM 

“Back soon, Bob... — FollowFutures on 
Hi, this is Clark Max- Facebook at: 
well, the cleverest man _go.nature.com/mtoodm 


Problem solved. 


in the world. Ten seconds, €10,000. What’s 
the problem?” 

“This is Lewis. Can't seem to get a girl- 
friend, Clark. 

“Hmm. That’s because you're so self- 
obsessed. Get a hair cut and start paying 
attention to someone beside yourself? 

“Hey, can you see me?” 

“No. Never seen you in my life, Lewis.” 

“Then how do you know that’s true? About 
the haircut and everything?” 

“You've got €10,000 to spare and you're 
using it to ask a stranger how to get a girl. 
Anyone who thinks that money solves 
all their problems is probably pretty self- 
obsessed. Time's up!” 

“But...” 

Clark tapped at his keyboard. 

“Hi Bob! I can see you now.’ 

“How?” 

“T’ve taken control of the plane you 
jumped from,” 

“Can you do that?” 

“Did I mention I was the cleverest man 
in the world? Hold it, Bob, Ill be back in a 
minute!” 

“T don’t have a minute!” 

“Hi, this is Clark Maxwell, the cleverest 
man in the world. Ten seconds, €10,000. 
How can I be of service?” 

“Clark, this is your wife, the smartest 
woman in the world. Have you walked off 
with my car keys again?” 

“Sorry, Lois. Will you be home tonight?” 

“Assuming I get the supercollider fixed. I 
think I know what's causing the problem. It’s 
not its future self it’s interfering with, it’s its 
past self” 
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“Sounds cool, dear. Got to go! Hi, this 
is Clark Maxwell, the cleverest man in the 
world. Ten seconds, €10,000. Hit it!” 

“Clark, this is Tessa Walkiewicz, Acronym 
News. We're doing a report on the accelera- 
tion of change and wed like a few words...” 

“Certainly, Tessa. Just a moment... Bob, 
youre falling too fast. Hold your arms and 
legs wide. I’m sure you've seen people do it 
in films!” 

“Tt looks easier in films, Clark” 

“I know! Just do your best! Marianne is 
jumping out of the plane, right now. She's got 
a spare chute for you.” 

“What plane?” 

“Your plane, Bob. The one you jumped out 
of. It’s right behind you!” 

“Oh! That's clever!” 

“That's my job... back ina moment, Bob. 
Tessa! What's the question?” 

“Well, Clark. Given the growth of the 
Internet and the new paradigms of inter- 
connectivity, people such as yourself are 
emerging as a powerful force for social 
change. Plugged into the world’s data 
streams, you have a view of everything 
changing from minute to minute.” 

“That's not a question, Tessa.” 

“No, that’s an intro, Clark. The question is 
this: given that people are using services such 
as yours more and more, does that mean they 
are getting less intelligent?” 

“Thardly think that many people are using 
my service, Tessa. Not at the prices I charge!” 

“Maybe not yours, Clark, but given that 
the answer to any problem you have is only 
a phone call away, why should people think 
for themselves anymore?” 

“Let me turn that around, Tessa. When 
they stop thinking, they stop being people. 
Got to go!” 

“But...” 

“Hi, this is Clark Maxwell, the cleverest 
man in the world. Ten seconds, €10,000. ’'m 
listening!” 

“Uh, Clark, this is Marianne. I jumped out 
of the plane, I’ve attached myself to Bob?” 

“Well done Marianne! What's the prob- 
lem?” 

“It's my chute. It’s failed to open, too. The 
ground’s looking awfully close.” 

“Marianne, thank you! I do like a chal- 
lenge! Now, listen to me carefully...” = 


Tony Ballantyne has had short stories 
published in magazines and anthologies 
around the world. His latest novel, Blood 
And Iron, is published by Tor UK. 
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Global systematics of arc volcano position 


ARISING FROM Grove, T. et al. Nature 459, 694-697 (2009) 


Global systematics in the location of volcanic arcs above subduction 
zones'” are widely considered to be a clue to the melting processes that 
occur at depth, and the locations of the arcs have often been explained 
in terms of the release of hydrous fluids near the top of the subducting 
slab (see, for example, refs 3-6). Grove et al.’ conclude that arc volcano 
location is controlled by melting in the mantle at temperatures above 
the water-saturated upper-mantle solidus and below the upper limit of 
stability of the mineral chlorite and in particular, that the arc fronts lie 
directly above the shallowest point of such melt regions in the mantle. 
Here we show that this conclusion is incorrect because the calculated 
arc locations of Grove et al.’ are in error owing to the inadequate spatial 
resolution of their numerical models, and because the agreement that 
they find between predicted and observed systematics arises from a 
spurious correlation between calculated arc location and slab dip. A 
more informative conclusion to draw from their experiments is that 
the limits of chlorite stability (figure 1b of ref. 7) cannot explain the 
global systematics in the depth to the slab beneath the sharply localized 
arc fronts. 

Grove et al.’ hypothesize that arc volcano location is controlled by 
melting in the mantle at pressure and temperature conditions defined 
as ‘P, Tet in their figure 1b. Grove et al.’ then use numerical models of 
subduction zones to predict arc location and its global systematics. 
They conclude that the agreement between their calculated systematics 
of arc location and observations of real subduction zones** validates 
their hypothesis (figure 3 of ref. 7) but closer inspection of the shape of 
the P, Tne, region casts doubt upon this conclusion. A characteristic 
feature of subduction-zone models’ is the narrow thermal boundary 
layer, sub-parallel to and just above the slab surface, which contains the 
temperature range of P, Tne (~800-850 °C). For all but the slowest 
convergence rates, this boundary layer begins close to the depth at 
which the slab is viscously coupled to the wedge. Hence we should 
expect the region enclosing P, Tye to be a very thin, continuous layer 
above the slab, with its shallowest extent at an almost constant depth. 


The results of Grove et al.’ (green squares in their figure 2) are in- 
consistent with this expectation, and raise the suspicion of an error in 
their calculations. 

To locate their region of P, Tye; Grove et al.’ determined which 
nodes of their 2.3 X 2.3-km computational mesh lay within that P-T 
range. Because those conditions occur within a boundary layer only a 
few kilometres thick that is inclined at an angle to the mesh, this 
procedure did not resolve the full extent of the P, Ter region. To 
check their results, we calculated the temperature fields for subduc- 
tion zones on a 1 X 1-km grid, then resampled it to both 2.3-km 
resolution and to 0.25-km resolution. This was done for a range of 
subduction parameters and for each calculation we determined the P, 
Tet region and its shallowest point. We found that at 2.3-km reso- 
lution, the minimum depth of P, Tne ranged between about 57 and 
76 km, consistent with the range found by Grove et al.’. On the 
0.25 X 0.25-km grid, however, the minimum depth was confined 
between 57 and 61 km (Fig. la), consistent with the expectations we 
describe in the preceding paragraph. At either resolution, the minimum 
depth of P, Tet: is independent of the slab dip and of the convergence 
rate. 

Grove et al.’ compare their calculations with seismic studies, which 
show that the depth of the slab beneath arcs varies between ~80 and 
~150 km (refs 2, 8) and has a negative correlation with the descent 
speed of the slab (Fig. 1b). The depth to the top of the slab predicted by 
the hypothesis of Grove et al.’ applied under our recalculations is 
~60-75 km, independent of dip or convergence rate (Fig. 1b), and 
thus does not agree with the observations. 

The agreement between model and observations in Grove et al.’ is 
spurious, and is the result of their choice of variables. Figure 1c recre- 
ates their figure 3, which shows the apparent consistency between 
model and observations, using our recalculated location of arcs. The 
sine of slab dip is plotted on the x axis, and on the y axis is the arc— 
trench distance, which for all points (calculated and observed; see 
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Figure 1 | Arc position versus subduction parameters for data and models. 
a, Calculated depth Dimer of the shallowest portion of the P, Tmer-based melting 
field (compare figures 1 and 2 of ref. 7). Calculations were carried out ona 1-km 
finite-volume mesh’, for dip of 30° to 70° in steps of 10°, and for convergence 
rate V from 30 to 100 mmyr~ ', in steps of 10 mm yr _'; these ranges include the 
parameters of the calculations of ref. 7. The points correspond to the minimum 
depths of melting calculated according to the hypothesis and methods of Grove 
et al.’ for a 2.3 X 2.3-km resampled grid (open triangles) and for a 0.25 X 0.25- 
km resampled grid (filled triangles). b, Diamonds show the depth of the slab 
Dgap» determined seismologically* (error bars as described by ref. 2); filled 
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triangles show the calculated D,,4 below the locus of shallowest melting, for the 
0.25 X 0.25-km resampled grid from panel a. The red triangles correspond to 
the corrected values of D,j» for the combinations of dip and convergence rate 
used by ref. 7 (T. Grove et al., personal communication). The grey line 
corresponds to a constant Dgap = 62 km.c, This panel corresponds to the lower 
200 km of figure 3 in ref. 7. Points as for panel b, plotted for the horizontal 
distance between the trench and the arc, which is equal to Dya,/tan(Dip), the 
quantity on the y axis of figure 3 of ref. 7. The grey line corresponds to 

Data» = 62 km and demonstrates the spurious correlation referred to in the 
main text. 
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table 1 in ref. 7) is taken as the depth of the slab divided by the tangent 
of the dip. The presence of the sine of the dip on each axis ensures a 
spurious correlation; this is illustrated clearly in Fig. 1c by the grey line 
that corresponds to a constant value of the depth of the slab, 
Dygab = 62 km. 

Therefore there is no significance in the match between models and 
observations reported by Grove et al.’, and their conclusion that “the 
kinematic control on the location of mantle melting is primarily slab 
dip” (page 696 of ref. 7) is mistaken. Instead, we conclude from their 
experiments that the limits of chlorite stability (figure 1b of ref. 7) 
cannot explain the global systematics in the depth of the slab beneath 
sharply localized arc fronts, which is true for any strongly temperature- 
dependent process that takes place near the top of the slab, as we have 
discussed. In ref. 10 we suggest a process that can account for the global 
systematics in location of the arcs. 
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REPLYING TO England, P. C. & Katz, R. F. Nature 468, doi:10.1038/nature09154 (2010) 


In their Comment England and Katz' suggest that our model’ con- 
tains two flaws and that there are additional problems in our thermal 
models. This Reply points out an important part of our model that 
England and Katz’ appear to have missed, addresses their suggestion 
that there are flaws and discusses whether our thermal models are in 
error. 

The Comment' states that we “conclude that the arc fronts lie 
directly above the shallowest point [that satisfies the P, T,,. criterion] 
in the mantle”. This corresponds to our path A in figure 1 of ref. 2. The 
P, Tmeit Criterion described in ref. 1 refers to the melting that initiates 
just above the slab over a range of depths, illustrated between paths A 
and C in figure 1 of ref. 2. As we discussed’, when these initial melts 
ascend into the overlying mantle wedge, not all of them will experi- 
ence a pressure-temperature path that allows them to erupt from 
an arc volcano on the Earth’s surface. The melts formed at the shal- 
lowest depths (path A) will encounter cooler mantle as they ascend 
into the overlying mantle wedge and these melts will freeze. Only 
melts that ascend into the hottest interior portion of the mantle wedge 
(such as path B in figure 1 of ref. 2) will undergo sufficient melting to 
produce arc front volcanoes. To summarize our findings’, there are 
two important factors that control the location of arc volcanoes: (1) 
chlorite dehydration releases HO near the slab—wedge interface, and 
the H,O ascends into overlying mantle that is above the H,O- 
saturated mantle solidus (P, Timer in figure 1b of ref. 2) and (2) the 
temperature of the overlying mantle wedge increases with decreasing 
pressure to allow flux melting to continue to high extents and allow 
these high-extent melts to erupt at arc volcanoes (path B in figure 1 of 
ref. 2). 

England and Katz also state that the agreement that we” “find 
between predicted and observed systematics arises from a spurious 
correlation between calculated arc location and slab dip” (ref. 1). They 
attribute this purported spurious correlation in our figure 3 (ref. 2) to 
the presence of the tangent function on the vertical axis and a sine 
function on the horizontal axis. Although there is trigonometry 
involved in the correlation shown on this figure’, the relations are 


not spurious and are meaningful. The salient point in our figure 3 is 
that the beginning of H.O-saturated melting in our modelling (path A 
in figure 1a of ref. 2) consistently occurs at a depth of 60-70 km near 
the slab-wedge interface and is independent of the convergence rate 
and dip. We point out that these shallowest melts do not reach the 
surface (figure 1 of ref. 2), nor do they influence the location of 
volcanoes. Instead, the maximum amount of melting and hence the 
location of arc volcanoes are controlled by the position of the hottest 
part of the wedge above a slab. This is the region between paths B and 
C (figure 1 of ref. 2), the region of maximum melting from our models. 
The arc-trench distance for paths B to C, and thus the location of arc 
volcanoes, is close to the values reported by England et al.’ and parallel 
to the trend of Syracuse and Abers*. The distance a given isotherm is 
from the trench decreases with increasing convergence rate and spans 
a range of values that are represented in the data of ref. 3. An inter- 
esting outcome of our thermal modelling (figure 2 of ref. 2) is that at 
steep dip angles, paths A and B occur at very similar distances from 
the trench. 

England and Katz say that our thermal modelling results (in figure 
2 of ref. 2) “raise the suspicion of an error in [our] calculations” (ref. 
1). England and Katz continue with a discussion of grid size in the 
numerical calculations that they performed, but it is impossible for 
us or for any reader of the Comment’ to assess the veracity of their 
claim that we’ “did not resolve the full extent of the P, Tmett region” 
(ref. 1). We have verified our modelling methods using the com- 
munity benchmarks developed for subduction zone modelling® and 
we also find that our model results for the temperature structure 
near the slab—wedge interface are comparable to those of others who 
have benchmarked their models, such as Wada and Wang®, who 
explicitly considered the issues associated with slab-mantle viscous 
coupling. 

Thus, we disagree with the conclusion reached by England and 
Katz’ that “the limits of chlorite stability cannot explain the global 
systematics in the depth of the slab beneath sharply localized arc 
fronts”. The conclusions we reached in ref. 2 rely on the interplay 
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of two important controls on hydrous melting in the mantle wedge 
above subducted slabs: the dehydration of chlorite near the base of the 
wedge and the temperature structure of the overlying mantle wedge. 


T. L. Grove’, C. B. Till7, E. Lev, N. Chatterjee? & E. Médard® 
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Robust multicellular computing using genetically 
encoded NOR gates and chemical ‘wires’ 


Alvin Tamsir', Jeffrey J. Tabor? & Christopher A. Voigt? 


Computation underlies the organization of cells into higher-order 
structures, for example during development or the spatial asso- 
ciation of bacteria in a biofilm’ *. Each cell performs a simple 
computational operation, but when combined with cell-cell com- 
munication, intricate patterns emerge. Here we study this process 
by combining a simple genetic circuit with quorum sensing to 
produce more complex computations in space. We construct a 
simple NOR logic gate in Escherichia coli by arranging two tandem 
promoters that function as inputs to drive the transcription of a 
repressor. The repressor inactivates a promoter that serves as the 
output. Individual colonies of E. coli carry the same NOR gate, but 
the inputs and outputs are wired to different orthogonal quorum- 
sensing ‘sender’ and ‘receiver’ devices**. The quorum molecules 
form the wires between gates. By arranging the colonies in different 
spatial configurations, all possible two-input gates are produced, 
including the difficult XOR and EQUALS functions. The response 
is strong and robust, with 5- to >300-fold changes between the ‘on’ 
and ‘off states. This work helps elucidate the design rules by which 
simple logic can be harnessed to produce diverse and complex 
calculations by rewiring communication between cells. 

Boolean logic gates integrate multiple digital inputs into a digital out- 
put. Electronic integrated circuits consist of many layered gates. In cells, 
regulatory networks encode logic operations that integrate environ- 
mental and cellular signals®*. Synthetic genetic logic gates have been 
constructed, including those that perform AND, OR and NOT func- 
tions””’, and have been used in pharmaceutical and biotechnological 
applications'*'*. Multiple gates can be layered to build more complex 
programs'*'’, but it remains difficult to predict how a combination of 
circuits will behave on the basis of the functions of the individuals’. 
Here we have compartmentalized a simple logic gate into separate E. coli 
strains and use quorum signalling to allow communication between the 
strains’. Compartmentalizing the circuit produces more reliable com- 
putation by population-averaging the response. In addition, a program 
can be built from a smaller number of orthogonal parts (for example 
transcription factors) by re-using them in multiple cells. 

NOR and NAND gates are unique because they are functionally 
complete. That is, any computational operation can be implemented 
by layering either of these gates alone’’. Of these, the NOR gate is the 
simplest to implement using existing genetic parts. A NOR gate is ‘on’ 
only when both inputs are ‘off (Fig. la). We designed a simple NOR 
gate by adding a second input promoter to a NOT gate”. Tandem 
promoters with the same orientation drive the expression of a tran- 
scriptional repressor (Fig. 1b). Tandem promoters are common in 
prokaryotic genomes”. This is expected to produce an OR function; 
however, interference between the promoters can occur (Supplemen- 
tary Figure 3). The repressor turns off a downstream promoter, which 
serves as the output of the gate. Both the inputs and the output of this 
gate are promoters; thus, multiple gates could be layered to produce 
more complex operations. 

Each logic gate is encoded in separate strains of E. coli. Acyl homo- 
serine lactone (AHL) cell-cell communication devices are used as 


signal-carrying ‘wires’ to connect the logic gates encoded in different 
strains**’*. Gates are connected in series where the output of the first 
gate is the expression of the AHL synthase (Pseudomonas aeruginosa 
PAOI LasI or RhlI). AHL diffuses through the cell membrane and 
binds to its cognate transcription factor (P. aeruginosa PAO] LasR or 
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Figure 1 | The genetic NOR gate. a, b, Symbol, truth table (a) and genetic 
diagram (b) of the NOR gate. c, The transfer function is defined as the output as 
a function of input at steady state. The transfer functions of Pap and Pre (top), 
the Pgap-P-ret tandem promoter (middle), and the NOR gate (bottom) are 
shown. The inducer concentrations for the tandem promoter and NOR gate 
characterizations are 0, 0.0005, 0.005, 0.05, 0.5 and 5 mM Ara (squares from left 
to right) and 0, 0.025, 0.25, 2.5, 25 and 250 ng ml | laTc (squares from bottom to 
top). Fluorescence values and their error bars are calculated as mean + s.d. 
from three experiments. a.u., arbitrary units. 
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RhlR). The promoter that is turned on by the transcription factor is 
used as the input to the next logic gate. These systems have been used 
previously to program cell-cell communication and have been shown 
to have little cross-talk*. Analogous toa series of electrical gates arrayed 
ona circuit board, compartmentalization of genetic gates in individual 
cells allows them to be added, removed or replaced simply by changing 
the spatial arrangement of the E. coli strains. 

The stepwise construction of a NOR gate with Pgap and Pre as the 
input promoters and yellow fluorescent protein (YFP) as the output 
gene is shown in Fig. 1c. Pgap and Pre, are activated in the presence of 
arabinose (Ara) and anhydrotetracycline (aTc), respectively. The indi- 
vidual transfer functions of Pgap and P+. are measured using flow 
cytometry (Fig. 1c). An OR gate is constructed by placing the Pgap and 
Pret promoters in tandem. Pgap-Pre demonstrates OR logic with 
7,000-fold induction between the ‘off state (—Ara, —aTc) and the 
‘on’ state (+ Ara, +aTc). Finally, to convert the OR gate into a NOR 
gate, the Cl-repressor gene is placed under the control of Pgap-Pret 
and YFP is expressed from a second plasmid under the control of the 
Cl-repressible Pp promoter. Whereas the OR gates have some char- 
acteristics of fuzzy logic, the NOR gates are nearly digital (Fig. 1c). 

These OR and NOR gates use promoters as inputs. This feature 
imparts modularity to the gates; in other words, they can be engineered 
to respond to different inputs by replacing the promoters. To investi- 
gate this, we swapped the input promoters of the logic gates. Figure 2 
shows the characterization data for three different tandem promoters: 
Pgap-Pret Ppap—Pras and Pret—Pras. The promoter P,, is activated by 
the quorum signal 30C12-HSL” (N-3-oxo-dodecanoyl-homoserine 
lactone). These gates perform as the additive combination of the indi- 
vidual transfer functions of the two input promoters and the CI- 
repressor NOT gate. The predicted transfer function for the six logic 
gates shown in Fig. 2 matched the experimental results. One tandem 
promoter, P-—-Pgap; did not function as predicted (Supplementary 
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Figure 2 | Input modularity of the gates. a, Transfer functions for three OR 
gates (left) are compared with the predicted transfer function (right). The 
predicted transfer function is the simple sum of the transfer functions measured 
for the individual promoters (Supplementary Information). The Ara and aTc 
concentrations used are the same as in Fig. 1 and those for 30C12-HSL are 0, 
0.001, 0.01, 0.1, 1 and 10 uM (squares from bottom to top). b, Transfer 
functions for three NOR gates (left) are compared with the predicted transfer 
functions (right). The data represent means calculated from three experiments. 
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Figure 3). The failure observed for Pre—Pgap probably arises from 
some position-dependent interference. This could be the result of 
the effects of DNA looping, the occlusion of transcription-factor-binding 
sites or changes in the ratio or stability of output messenger RNAs, 
among other effects. 

Complex logic can be designed using layers of simpler gates. An 
XOR gate is built with three NOR gates and a buffer gate (Fig. 3a). The 
output of an XOR gate is ‘on’ only when either (but not both) inputs 
are ‘on’. Four strains, each carrying a different logic gate, are used to 
construct an XOR circuit. The strains are spotted onto an agar plate in 
the spatial arrangement required to perform this function (Fig. 3b). 
Cell 1 carries a NOR gate that uses Ara and aTc as inputs and expresses 
Las] as the output. This allows cell 1 to be wired to the NOR gates in 
cells 2 and 3 by means of 30C12-HSL. Cells 2 and 3 use Ara and aTc as 
their second inputs, respectively. Similarly, the output of the NOR 
gates in cells 2 and 3 is Rhll, which produces C4-HSL” (N-butyryl- 
homoserine lactone). Cell 4 acts as a buffer gate and integrates the 
outputs from cells 2 and 3 by responding to C4-HSL. The output of a 
buffer gate is ‘on’ only when the input is ‘on’. The complete circuit 
consisting of all four strains behaves as a digital XOR gate with respect 
to the two inputs (Ara and aTc; Fig. 3c, d). Each intermediate colony 
performs its digital logical operations appropriately, as tested by repla- 
cing each output gene with YFP (Fig. 3c). 

We constructed a small library of strains that act as simple logic 
gates, most of which are components of the XOR gate (Fig. 4a). Circuit 
diagrams showing how all of the sixteen possible two-input logic gates 
can be constructed using the library are shown in Fig. 4b. Each circuit 
diagram is reproduced by the spatial arrangement of the component 
strains. None of these circuits required additional genetic manipula- 
tion. The range of induction varies from 5-fold (XOR) to 335-fold (B 
gate). The dominant contribution to the dynamic range of the com- 
plete circuit is due to the intrinsic range of the final circuit (Sup- 
plementary Figure 7). For example, the XOR and NAND gates are 
limited by the output of Pay. The addition of a NOT gate to this 
promoter increases the dynamic ranges of the EQUAL, AND, A 
IMPLY B and B IMPLY A gates, which is an effect described previ- 
ously’. No degradation in the signal is observed as a function of the 
number of layers. 

The calculations are robust with respect to the distance between 
colonies and the time and density at which they are spotted (Sup- 
plementary Figure 10). This robustness is partially due to the popu- 
lation averaging that occurs, which reduces the effect of cell-cell 
variation. Despite the variability in the circuit response within a colony, 
this variability is effectively averaged and thus is not propagated to the 
next layer of the circuit. The use of chemical signals and population 
averaging could represent a common design rule for achieving com- 
putational operations robust enough to overcome the stochastic limi- 
tations of layered circuits in individual cells”. Another source of 
robustness is the external clock that is implemented by delaying the 
spotting of colonies for each layer. Genetic computing is asynchronous 
and this may result in hazards, that is, transient incorrect outputs that 
occur as a result of mismatched delays in the circuit”®. This is apparent 
when circuits are measured in liquid culture, where the calculation is 
less robust with respect to timing and cell density (Supplementary 
Figure 12). To perform the calculation properly, all of the cells need 
to start in the ‘off state. As layered computation becomes more critical 
to the design of genetic programs, this will either require the imple- 
mentation of a genetic clock” or the design of programs that are robust 
to asynchronous computation”. 

Cellular automata have been used to show how simple logic yields 
complex patterns in the organization of cells’. These have been used to 
model biological pattern formation, development and complex col- 
lective behaviour*”’. Here we demonstrate that a library of simple gates 
can be used to form more complex computational operations by link- 
ing the gates using diffusible chemical signals. The motif of multiple 
promoters in tandem driving the expression of a repressor is common 
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Figure 3 | Construction of an XOR gate by programming communication 
between colonies on a plate. a, Four colonies—each composed of a strain 
containing a single gate—are arranged such that the computation progresses 
from left to right, with the result of each layer communicated by means of 
quorum signals. The inputs (Ara and aTc) are added uniformly to the plate. 
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Figure 4 | Construction of all 16 two-input Boolean logic gates. a, Library of 
simple logic gates carried by different strains (corresponding to plasmids in 

Supplementary Table 5). b, Colonies containing different gates were spotted to 
mimic the spatial arrangement of each logic circuit (Fig. 3b). For each circuit, 
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b, Spatial arrangement of the colonies. c, Each colony responds appropriately to 
the combinations of input signals. Fluorescence values and their error bars are 
calculated as mean ~ s.d. from three experiments. d, Cytometry data for the 


XOR gate (cell 4). 
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the final colony was assayed by flow cytometry for all combinations of inducers 
added to the plate. The data correspond to the cytometry distributions in 
Supplementary Figure 6. Fluorescence values and their error bars are calculated 
as mean = s.d. from three experiments. NIMPLY, NOT IMPLY. 
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in genomes”, and the resulting NOR gates may represent a ubiquitous 
fundamental unit of biological computation. Although our current 
ability to create logic gates within a single cell is limited, it may ulti- 
mately be possible to encode more complex circuits in individual cells 
that are then linked by cell-cell communication, akin to logic blocks in 
field-programmable gate arrays”. Together, these principles can be 
used in the engineering of biological systems to create increasingly 
complex functions. 


METHODS SUMMARY 

Strains, plasmids and media. All studies were performed using E. coli strain 
DHI1O0B. Luria-Bertani (LB)-Miller medium (Difco 244610) was used for the 
assays. The antibiotics used were 50,gml’ chloramphenicol (Acros 
227920250) and/or 50 pg ml ! kanamycin (Fisher BP906-5). The inducers used 
were arabinose (Sigma A3256), anhydrotetracycline (Fluka 37919) and 30C12- 
HSL (Sigma 09139). 

Transfer function characterization. Cells harbouring the appropriate plasmids 
were incubated in 3 ml of LB broth medium (37 °C, 250 r.p.m. shaking) in culture 
tubes without the presence of inducers for 18h. The cultures were then diluted 
200-fold into 200 ul fresh LB broth medium (supplemented with appropriate 
inducers) in a 96-well plate format and incubated for additional 14h before finally 
being diluted 100-fold into PBS solution for cytometry analysis. 

Plate assay of circuit function. The plate medium was prepared by pouring 12 ml 
of LB broth agar medium (1.5% agar (Difco 214030), 2.5% LB-Miller) supplemented 
with inducers (2mM Ara and/or 500 ng ml ' aTc) into a 100-mm Petri dish (Fisher 
08-757-13). Bacterial logic gates were ‘fabricated’ on the plate by spotting 1 ul 
overnight culture of appropriate bacterial strains (Supplementary Table 5) to mimic 
the spatial arrangement of each logic circuit. The distance between each two colonies 
was set at 7mm in square grids. Spotting was done with 12-h delay from the 
previous layer’s spotting to ensure communication signals had propagated suffi- 
ciently. After 12h from the last layer’s spotting, the whole output colony of the 
circuit was scraped using inoculating loops and diluted into 10 ml PBS solution for 
cytometry analysis. 

Flow cytometry. All data contained at least 50,000 events, obtained using BD- 
FACS LSR2. Events were gated by forward and side scatter using MATLAB soft- 
ware. The geometric means of the fluorescence distributions were calculated. The 
autofluorescence value of E. coli DH10B cells harbouring no plasmid was sub- 
tracted from these values to give the fluorescence values reported in this study. 
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Noise correlations improve response fidelity and 


stimulus encoding 
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Computation in the nervous system often relies on the integration 
of signals from parallel circuits with different functional properties. 
Correlated noise in these inputs can, in principle, have diverse and 
dramatic effects on the reliability of the resulting computations’ *. 
Such theoretical predictions have rarely been tested experimentally 
because of a scarcity of preparations that permit measurement of 
both the covariation of a neuron’s input signals and the effect on a 
cell’s output of manipulating such covariation. Here we introduce a 
method to measure covariation of the excitatory and inhibitory 
inputs a cell receives. This method revealed strong correlated noise 
in the inputs to two types of retinal ganglion cell. Eliminating cor- 
related noise without changing other input properties substantially 
decreased the accuracy with which a cell’s spike outputs encoded 
light inputs. Thus, covariation of excitatory and inhibitory inputs 
can be a critical determinant of the reliability of neural coding and 
computation. 

Differences in the properties of excitatory and inhibitory synaptic 
inputs to a target cell provide a key control of neural activity. Feed- 
forward inhibitory synaptic input is a ubiquitous example. A delay in 
inhibitory input relative to excitatory input, for example by an extra 
synaptic delay in the circuit providing inhibitory input, can limit res- 
ponse duration to the time window in which the target cell receives 
excitatory but not inhibitory input”. More generally, inhibitory input 
can cancel unwanted responses by arriving before or at the same time 
as excitatory input’®’. Theoretical work illustrates how the effective- 
ness of these computations depends on the strength of covariation 
between excitatory and inhibitory synaptic inputs*. Thus, although 
synaptic noise will always decrease the reliability of the neural res- 
ponse, strong noise correlations, unlike independent noise, could allow 
fluctuations in inhibitory synaptic input to cancel corresponding fluc- 
tuations in excitatory synaptic input” (Fig. 1). Such noise correlations 
can arise if noise within excitatory and inhibitory pathways originates 
from a common source (Fig. 1, left), for example in densely and randomly 
connected recurrent networks’. Noise cancellation in synaptic integ- 
ration could in turn reduce trial-to-trial variability in a cell’s spike output 
(Fig. 1, right). 

The extent and impact of noise correlations depends on several 
network and cellular properties, including nonlinearities in synaptic 
transmission’ or spike generation’ that could decrease correlation 
strength. This dependence makes it difficult to predict the importance 
of noise correlations from modelling alone or from correlations mea- 
sured in cell pairs. Work on the retina provides a rare opportunity to 
provide quantitative experimental information about how noise cor- 
relations affect the coding of physiologically relevant stimuli. Our goal 
was first to measure covariation of the excitatory and inhibitory syn- 
aptic inputs received by a retinal ganglion cell (Fig. 1, (Q1)) and then to 
test how these noise correlations affect the encoding of light stimuli ina 
cell’s spike output (Fig. 1, (Q2)). 

Quantifying the covariation of excitatory and inhibitory synaptic 
input requires measuring these two conductances simultaneously or 
near simultaneously. To do this, we rapidly alternated the ganglion cell 
voltage between the reversal potentials for excitatory and inhibitory 


synaptic inputs, collecting a single sample of each input every 10 ms 
(Fig. 2a). Control experiments indicated that the voltage at the synaptic 
receptors had reached a near-constant value at these sampling times 
(Supplementary Fig. 1). This sampling rate is high in comparison with 
the 50-100 ms time course of a ganglion cell’s response to light inputs. 
To check how well this procedure captured light-dependent changes in 
conductance, we compared the simultaneously measured conduc- 
tances with those measured non-simultaneously when the voltage 
was held constant at the excitatory or inhibitory reversal potential. 
Mean excitatory and inhibitory conductances resulting from a 
repeated, modulated light input differed minimally (Fig. 2b). In 21 
cells, the alternating voltage approach captured 99.9 + 0.6% of the 
power of the conductance signal and 83 + 4% of that of the conduc- 
tance noise (mean + s.e.m.; see Methods). Thus, simultaneous con- 
ductance measurements capture most of the structure in the synaptic 
inputs a ganglion cell receives. 

Simultaneous conductances measured during constant light input 
often exhibited spontaneous excitatory synaptic events accompanied in 
time by inhibitory synaptic events (Fig. 2c, top, black arrowheads). Such 
events in fact typically occurred together. Correlated noise events were 
rarely observed during non-simultaneously measured conductances 
(Fig. 2c, bottom). Correspondingly, the cross-correlation function for 
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Figure 1 | Effects of noise correlations on the variability of synaptic current 
and spike output. Neural encoding consists of three basic steps: a stimulus 
shapes excitatory (blue, G,,.) and inhibitory (red, Gi,,) synaptic conductances; 
these conductances then shape synaptic currents; and the resulting currents 
control spike generation to produce a sequence of action potentials (spikes). 
Noise correlations will be strong if a common source dominates noise in 
excitatory and inhibitory pathways (Noise.om) and minimal if the dominant 
noise source arises independently (Noise). Correlated (black traces) as 
opposed to uncorrelated (green traces) noise between excitatory and inhibitory 
conductances can lead to lower variability of both the synaptic current and the 
spike output (shaded regions around traces). Understanding this issue requires 
answering two questions. (Q1) How much do converging excitatory and 
inhibitory input covary? (Q2) What is the impact of such noise correlations on 
the neural output? 
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simultaneously measured excitatory and inhibitory conductances during 
constant light input showed considerable structure, unlike the cross- 
correlation for non-simultaneously measured conductances (single cell: 
Fig. 2d, top; population: Fig. 2d, bottom). Thus, simultaneous conduc- 
tance recordings revealed correlations between converging synaptic 
inputs that were inaccessible from more conventional recordings. 

To determine both the strength of noise correlations during modu- 
lated light input and their effect on a cell’s spike output, we first con- 
sidered midget ganglion cells, which comprise the majority of ganglion 
cells in the primate retina'’. Midget ganglion cells receive delayed feed- 
forward synaptic inhibition, where the delay reflects an extra synapse 
in the circuit controlling inhibitory input. Thus, excitatory input comes 
directly from bipolar cells, whereas inhibitory input comes from ama- 
crine cells that themselves receive input from bipolar cells'*. Similar 
delayed feed-forward inhibition is a characteristic of many cortical 


a Conductances Residuals 


Figure 2 | Near-simultaneous recording of excitatory and inhibitory 
synaptic input to an ON-OFF directionally selective ganglion cell. a, Light 
stimulus (S) is presented while the voltage (V) of the cell alternates between the 
excitatory (E.x-) and inhibitory (E;,,) reversal potentials. Excitatory (blue) and 
inhibitory (red) synaptic currents (I) are sampled at the end of each voltage step. 
b, Conductances derived from measured currents (Methods) and averaged 
across multiple repeats of the same stimulus (S). Simultaneously measured 
conductances (solid lines) closely match those (dashed lines) measured non- 
simultaneously with the voltage held fixed at the reversal potentials for 
excitatory or inhibitory input (both excitatory and inhibitory correlations are 
0.91 + 0.01 (mean + s.e.m.), 21 cells). a is an enlarged view of the boxed region 
of b. c, Top: section of simultaneously recorded conductances during constant 
light input shows correlated excitatory and inhibitory spontaneous events 
(black arrowheads). Bottom: non-simultaneously recorded conductances also 
show spontaneous events (green arrowheads), but they are rarely correlated. 
Records have been resampled at 50 Hz for comparison with the top 
conductances. d, Top: cross-correlation (mean + s.e.m., 10 trials) of excitatory 
and inhibitory conductances in an example cell during simultaneous (black) 
and non-simultaneous (green) recording. Bottom, cross-correlation for all 
recorded cells (mean + s.e.m., 6 cells). 


circuits, including hippocampus, cerebellum, barrel cortex and auditory 
cortex”"*!*°!, We simultaneously recorded excitatory and inhibitory 
synaptic inputs during a full-field modulated light stimulus (Fig. 3a, left) 
and estimated variability in the synaptic responses by subtracting the 
average synaptic input from each individual trial (Fig. 3a, right). The 
peak correlation strength of the resulting residuals ranged from 0.15 to 
0.5 (Fig. 3b, black traces). Noise correlations in the interleaved non- 
simultaneous conductances were substantially smaller (Fig. 3b, green 
traces). Slow drift in the light response accounted for the remaining 
noise correlations in the non-simultaneous conductances (Supplemen- 
tary Fig. 2). 

The alternating-voltage technique could produce artefactual noise 
correlations by overshooting the appropriate reversal potentials for 
excitatory or inhibitory synaptic inputs. For example, holding at a 
voltage positive relative to the excitatory reversal potential could cause 
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Figure 3 | Strength and impact of noise correlations in synaptic inputs to 
primate midget ganglion cells. a, Left: two trials of simultaneously recorded 
conductances during modulated light input (grey). Right: residual 
conductances (trials from left with mean subtracted), which estimate noise in 
each trial. b, Left: cross-correlation (mean + s.e.m., 12 trials) of excitatory and 
inhibitory residual conductances in an example cell during simultaneous 
(black) and non-simultaneous (green) recording. Right: cross-correlation for 
all recorded cells (mean + s.e.m., 15 cells). c, Logic of dynamic-clamp 
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experiments using simultaneously or shuffled simultaneous conductances in 
place of synaptic input. d, Example spike trains from 12 dynamic-clamp trials of 
simultaneous conductances (black) or their shuffled counterparts (green). SNR, 
signal-to-noise ratio. e, Signal-to-noise ratio of spike trains generated from 
simultaneous conductances versus that of spike trains generated from shuffled 
conductances (dots). The signal-to-noise ratio for simultaneous conductances 
was 1.22 + 0.04 times higher than that for shuffled conductances 

(mean = s.e.m., 7 cells, P= 0.0015). 
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an increase in the excitatory conductance to be misinterpreted as an 
increase in both the excitatory and inhibitory conductances, thus lead- 
ing to an artefactual correlation. A similar logic holds if a cell is held 
more negative than the reversal potential for inhibitory input. 
However, if anything the alternating-voltage technique fell short of 
the actual reversal potentials and hence underestimated the strength 
of noise correlations (Supplementary Fig. 3). 

To determine the effect of covariation of excitatory and inhibitory 
synaptic inputs on a midget ganglion cell’s response to physiological 
inputs, we compared the pattern of spikes produced by simultaneous 
(with noise correlations) and non-simultaneous (without noise corre- 
lations) conductances in dynamic-clamp experiments (Fig. 3c and 
Supplementary Fig. 4). The non-simultaneous conductances consisted 
of shuffled pairings of simultaneously recorded excitatory and inhibi- 
tory conductances; this procedure removed noise correlations while 
holding all other statistics constant. We compared the precision of the 
spike responses to the two sets of conductances by calculating the 
signal-to-noise ratio from repeated dynamic-clamp trials (Fig. 3d; 
see Methods). In all cases, the signal-to-noise ratio was higher for 
conductances with noise correlations (Fig. 3e). Quantifying the tem- 
poral precision of the spike responses using a spike distance metric”*”* 
gave similar results (data not shown). Thus, the precision of a midget 
cell’s output in response to light stimuli depends on the covariation of 
excitatory and inhibitory synaptic inputs. 

Feed-forward synaptic inhibition can serve a more diverse func- 
tional role when the amplitude or timing of inhibitory input relative 
to excitatory input depends on the stimulus. For example, the ability of 
a subset of retinal ganglion cells to respond to the direction of a moving 
object”*”* (Fig. 4a, b) relies on cancellation of excitatory input by inhibi- 
tory input in the non-preferred direction’®. Covariation of excitatory 
and inhibitory synaptic inputs could make such a mechanism robust to 
noise, for example by preventing a larger-than-average excitatory syn- 
aptic event from overwhelming the corresponding inhibitory synaptic 
event and causing a response to movement in an inappropriate dir- 
ection. To test this proposal, we recorded simultaneous conductances 
in mouse ON-OFF directionally selective ganglion cells (ON-OFF 
DSGCs) in response to a bar of light moving in different directions 
(Fig. 4a). Excitatory and inhibitory conductances showed strong noise 
correlations that were largely absent from non-simultaneous conduc- 
tances (Fig. 4d; see Supplementary Fig. 5 for results from full-field light 
stimuli). Both excitatory and inhibitory conductances and the strength 
of the noise correlations depended on bar direction (Fig. 4c, d). For 
example, noise correlations in the non-preferred direction were three to 
four times stronger than those in the preferred direction. Furthermore, 
excitatory and inhibitory conductances showed near-perfect covaria- 
tion in the non-preferred direction. 

We tested the impact of noise correlations on direction tuning using 
simultaneous (with noise correlations) and non-simultaneous (with- 
out noise correlations) conductances in dynamic-clamp experiments; 
non-simultaneous conductances consisted of simultaneous conduc- 
tances shuffled between trials but not bar directions. Both the mean 
and the standard deviation of the firing rate in the non-preferred 
direction were considerably higher for non-simultaneous conduc- 
tances (Fig. 4e, f). The failure of a cell to attenuate its response reliably 
for movement in the non-preferred direction should negatively affect 
its ability to encode direction. Indeed, each recorded cell showed 
greater direction selectivity for the simultaneous conductances 
(Fig. 4g). Thus, the computation underlying directional selectivity 
depends on covariation of excitatory and inhibitory synaptic inputs 
and the resulting cancellation of noise shared between the circuits 
providing each type of input. 

Computation in the retina follows a basic plan found in many other 
neural circuits: signals in a common population of inputs diverge to 
parallel and functionally dissimilar pathways, and integration of the 
signals from multiple parallel pathways governs the output of the circuit. 
Divergence into separate excitatory and inhibitory circuits is a prominent 
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Figure 4 | Strength and impact of noise correlations in synaptic inputs to 
ON-OFF directionally selective ganglion cells. a, A bar of light was moved in 
eight directions, at 45° increments in random order. b, Extracellular (cell- 
attached configuration) spike responses to the moving bar. c, Examples of 
simultaneously recorded conductances showing tuning of excitatory (blue) and 
inhibitory (red) conductances. d, Simultaneous conductances (black) show 
strong noise correlations that are largely absent from the non-simultaneous 
conductances (green). e, Normalized directional tuning (spike count versus 
direction) from a single dynamic-clamp experiment (mean + s.d.) for 20 trials 
of simultaneous or shuffled simultaneous conductances. Insets at 45° (preferred 
direction) and 225° (non-preferred direction) show spike rasters. f, Standard 
deviation of the normalized spike count is significantly smaller for 
simultaneous trials than for shuffled trials in non-preferred directions (135- 
315°; P<0.05, 10 cells). Standard deviations in the preferred direction were 
similar. g, Direction selectivity index (DSI; see Methods) is 2.0 + 0.2 times 
larger for simultaneous conductances than for shuffled conductances 

(mean + s.e.m., 10 cells, P = 0.0002). 


example of such a motif. Noise in shared inputs naturally causes covari- 
ation of signals in the parallel pathways. The strength of such noise 
correlations will depend on cellular properties within the network’*"*, 
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the stimulus delivered’® (Fig. 4) and the state of the network’’. Thus, 
excitatory and inhibitory inputs to cells in some, but not all, circuits are 
expected to show strong noise correlations, as indeed is the case in barrel 
cortex’’’*. Here we put such noise correlations in the context of the 
coding of physiologically relevant stimuli. Our results reveal a critical 
role for noise correlations in maintaining appropriate cancellation of 
excitatory and inhibitory inputs and thus sharpening tuning to specific 
stimuli. This work provides an example of neurons that perform com- 
putations reliant on noise correlations. Given the prevalence of circuits in 
which feed-forward inhibition shapes neural responses”’""*!??', noise 
correlations probably have a similar role in other neural circuits. 


METHODS SUMMARY 


We took electrical recordings from midget ganglion cells in primate and ON-OFF 
DSGCs in mouse retinas using patch-clamp techniques as previously 
described”*”’. Light stimuli were delivered from light-emitting diodes or an 
organic light-emitting diode monitor (eMagin). Mean light levels for all experi- 
ments were near 5,000 absorbed photons per cone per second. 

The 10-ms cycle period during the simultaneous conductance recordings allows 
us to resolve input at 50 Hz and below. The fraction of the measured current 
variance at this cycle time was determined by calculating the fraction of the 
variance of the non-simultaneous (constant-voltage) conductances that can be 
accounted for by the variance of the simultaneous conductances. 

Signal-to-noise ratios of spike outputs were calculated by forming spike trains of 
zeroes and ones from each trial, with 1-ms resolution. The mean and trial residuals 
of these spike trains were calculated and the power spectra of these functions were 
assessed and corrected for sample number bias*’. Power spectra were integrated 
between 1 and 20 Hz and the result for the mean responses was divided by that for 
the residuals (Supplementary Fig. 6). 

Spike number in ON-OFF DSGCs in response to the moving bar was summed 
over the entire duration of the bar’s movement. The direction selectivity index'® 
was calculated as DSI = |} “v;/}°r;|, where v; are vectors of lengths r;, equal to the 
normalized firing rate, and point in the direction of the moving bar that produced 
the presented conductances. 

Current injected into a cell (I) during dynamic-clamp experiments”! was calcu- 
lated as 


I(t) = Gexe(t)( Vit— At) — Eexc) 
+ Ginn(t)(V(t— At) — Einn) 


where Gexc and Ginn are a pair of conductances recorded during light stimulation, 
V is the cell’s membrane potential, and E.,. and E;,) are reversal potentials set 
respectively at 0 mV and —80 mV. Changing the inhibitory reversal potential, Einn; 
to —50 mV did not substantially affect the results. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Electrical recordings were made from midget ganglion cells in primate and ON- 
OFF DSGCs in mouse retinas as previously described’. Midget ganglion cells 
were identified by their relatively sustained response to light steps and characteristic 
morphology'””””?. ON-OFF DSGCs were identified by a combination of at least 
two of the following criteria: an on-off light response to a brief light step, a 
bistratified morphology and a directionally selective spike response. 

Light stimuli were delivered from light-emitting diodes or an organic light-emitting 
diode monitor (eMagin). Mean light levels for all experiments were near 5000 
absorbed photons per cone per second. Full-field stimuli consisted of 10 s of constant 
light followed by 10s of 50%-contrast modulated light (low-pass-filtered at 60 Hz) 
repeated for 5-20 trials. Moving bars were 180 um wide, 720 um long, moved at 
864 ums" along the long axis and had a contrast of between 100 and 150%. 

For all recordings a flat-mounted piece of retina was superfused with warmed 
(31-34 °C) and oxygenated (5% CO , 95% O 2) Ames solution. Midget cell 
dynamic-clamp experiments were performed with receptors mediating excitatory 
and inhibitory synaptic input blocked (10 uM NBQX, 1M stychnine, 10 11M 
gabazine). Pipettes for voltage-clamp recordings were filled with a Cs-based 
internal solution (105 mM CsCH3SO3, 10mM TEA-Cl, 20mM HEPES, 10 mM 
EGTA, 5mM Mg-ATP, 0.5mM Tris-GTP and 2mM QX-314, pH ~7.3, 
~280mosM). Pipettes for dynamic-clamp experiments were filled with a 
K-based internal solution (110mM K aspartate, 1mM MgCl, 10mM HEPES, 
5mM NMDG, 0.5mM CaCh, 10mM phosphocreatine, 4mM Mg-ATP and 
0.5mM Tris-GTP, pH ~7.2, ~280mosM). Liquid junction potentials were 
~10mV and were not compensated throughout the text. Low access resistance 
was critical, and only cells with access resistance below 20 MQ were included for 
analysis. Access resistance was partially compensated for (75% for experiments 
using an Axopatch 200B amplifier; 50% compensation and prediction for experi- 
ments using a Multiclamp 700B amplifier). Conductances were derived from 
excitatory and inhibitory synaptic currents by dividing the currents by assumed 
driving forces corresponding to voltages of —62 and +62 mV, respectively. 

Both ganglion cell types showed evidence for NMDA-receptor-mediated con- 
ductances (J-shaped I-V plots that became linear in the presence of 10 1M APV). 
The presence of an NMDA conductance could cause noise correlations to be 
substantially underestimated if the voltage is substantially below the excitatory 
reversal potential. However, we observed only a weak impact of this conductance 
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when noise correlations were compared before and after application of APV. 
Results from two cells recorded only in the presence of APV were included in 
the full data set. 

The 10-ms cycle period during the simultaneous conductance recordings allows 
us to resolve input at 50 Hz and below. The fraction of the measured current 
variance at this cycle time was determined by calculating the fraction of the 
variance of the non-simultaneous (constant-voltage) conductances that can be 
accounted for by the variance of the simultaneous conductances. 

Signal-to-noise ratios of spike outputs were calculated by forming spike trains of 
zeroes and ones from each trial, with 1-ms resolution. The mean and trial residuals 
of these spike trains were calculated and the power spectra of these functions were 
assessed and corrected for sample number bias**. Power spectra were integrated 
between 1 and 20Hz and the result for the mean responses was divided by that for 
the residuals (Supplementary Fig. 6). 

Spike number in ON-OFF DSGCs in response to the moving bar was summed 
over the entire duration of the bar’s movement. The direction selectivity index’ 
was calculated as DSI = |} ¢v;/}“r;|, where v; are vectors of lengths r;, equal to the 
normalized firing rate, and point in the direction of the moving bar that produced 
the presented conductances. 

Current injected into a cell (I) during dynamic-clamp experiments”! was calcu- 
lated as 


I(t) = Gexe(t)( Vit— At) — Eexc) 
+ Ginn (t)(V(t — At) —Einn) 


where G,,, and G;,, are a pair of conductances recorded during light stimulation, 
V is the cell’s membrane potential, and E... and Ein are reversal potentials set 
respectively at 0 mV and —80 mV. Changing the inhibitory reversal potential, E;,, 
to —50 mV did not substantially affect the results. 

Correlations were calculated using the ‘xcov’ function in MATLAB, release 
2009a (MathWorks) and normalized using the ‘coef option. Briefly, this function 
calculates the cross-correlation after subtracting the means from each trial and 
normalizes by the geometric mean of the autocorrelation (see Supplementary 
Information, equation (2.1)). 


32. Polyak, S. & Willmer, E. N. Retinal structure and colour vision. Doc. Ophthalmol. 3, 
24-56 (1949). 


©2010 Macmillan Publishers Limited. All rights reserved 


LR 


doi:10.1038/nature09573 


Crystal structure of bacterial RNA polymerase bound 
with a transcription inhibitor protein 


Shunsuke Tagami', Shun-ichi Sekine’?*, Thirumananseri Kumarevel*+, Nobumasa Hino’, Yuko Murayama’?, 


Syunsuke Kamegamori!”, Masaki Yamamoto”, Kensaku Sakamoto” & Shigeyuki Yokoyama 


The multi-subunit DNA-dependent RNA polymerase (RNAP) is 
the principal enzyme of transcription for gene expression. 
Transcription is regulated by various transcription factors. Gre 
factor homologue 1 (Gfh1), found in the Thermus genus, is a close 
homologue of the well-conserved bacterial transcription factor 
GreA, and inhibits transcription initiation and elongation by bind- 
ing directly to RNAP’*. The structural basis of transcription 
inhibition by Gfh1 has remained elusive, although the crystal struc- 
tures of RNAP and Gfh1 have been determined separately*°. Here 
we report the crystal structure of Thermus thermophilus RNAP 
complexed with Gfhl. The amino-terminal coiled-coil domain of 
Gfh1 fully occludes the channel formed between the two central 
modules of RNAP; this channel would normally be used for nuc- 
leotide triphosphate (NTP) entry into the catalytic site. 
Furthermore, the tip of the coiled-coil domain occupies the NTP 
B-y phosphate-binding site. The NTP-entry channel is expanded, 
because the central modules are ‘ratcheted’ relative to each other by 
~7°, as compared with the previously reported elongation com- 
plexes. This ‘ratcheted state’ is an alternative structural state, 
defined by a newly acquired contact between the central modules. 
Therefore, the shape of Gfh1 is appropriate to maintain RNAP in 
the ratcheted state. Simultaneously, the ratcheting expands the 
nucleic-acid-binding channel, and kinks the bridge helix, which 
connects the central modules. Taken together, the present results 
reveal that Gfh1 inhibits transcription by preventing NTP binding 
and freezing RNAP in the alternative structural state. The ratcheted 
state might also be associated with other aspects of transcription, 
such as RNAP translocation and transcription termination. 

RNAP synthesizes RNA complementary to the template DNA (Sup- 
plementary Fig. 1a). Crystallographic studies of RNAPs from thermo- 
philic bacteria and RNAP II (Pol II) from the yeast Saccharomyces 
cerevisiae have revealed the overall structure of RNAP, which resembles 
a crab’s claw’"’ (Supplementary Fig. 1b). In the transcribing RNAP 
(elongation complex, EC), the nascent RNA strand remains bound to 
the template DNA strand, forming an 8—9 base pair DNA*RNA hybrid. 
The DNA*RNA hybrid and the downstream DNA duplex are tightly 
held in the ‘primary channel’ formed between the pincers of the crab 
claw, in the EC structures of the S. cerevisiae and T. thermophilus 
RNAPs'*”". The catalytic site of nucleotide addition resides at the joint 
of the pincers. The substrate NTP is considered to enter the catalytic site 
through a pore (the ‘secondary channel’) on the back side of the crab 
claw (Supplementary Fig. 1b). The two pincers are connected by a long 
a-helix (the bridge helix), which is located at the junction of the 
DNAeRNA hybrid-binding site, the downstream DNA-binding site, 
and the secondary channel. The bridge helix is inherently flexible, adopt- 
ing both the continuously-helical and kinked conformations”'””*. 

In a previous study, we successfully crystallized T. thermophilus 
RNAP together with DNA, RNA and Gfhl, and collected X-ray 
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diffraction data sets for two P2, crystals (crystals 1 and 2). The 
nucleic-acid scaffolds employed for the crystallization included the 
downstream duplex DNA, the DNA*RNA hybrid, and an upstream 
RNA hairpin 10 or 11 nucleotides (nt) from the RNA 3’ end (Sup- 
plementary Text 1, Supplementary Fig. 2a)**. In the present study, 
we determined the structures of the quaternary complex of 
RNAP*DNAeRNAeGfh1 (EC*Gfh1) (Fig. la, b, Supplementary 
Table 1, and Methods). The structures of the three independent 
RNAP molecules in the asymmetric units of crystals 1 and 2 are all 
similar to each other (Supplementary Text 2, Supplementary Figs 
3—5). RNAP and Gfh1 showed clear electron densities (Supplemen- 
tary Fig. 6), and thus the inhibition mechanisms of Gfhl were un- 
ambiguously revealed, as described below. In contrast, the electron 
densities of both the DNA and RNA were weak, so we only built the 
partial models (Supplementary Text 3, Supplementary Figs 2, 7). 

The S. cerevisiae Pol II structure consists of four rigid modules, 
‘core’, ‘shelf, ‘clamp’ and ‘jaw-lobe’, which are mobile relative to each 
other’. The rigid modules of the bacterial RNAP were defined previ- 
ously, on the basis of the structures of the Thermus aquaticus core 
enzyme and the T. thermophilus holoenzyme''*”* (Supplementary 
Text 4). However, the conformations of the T. thermophilus RNAP 
in the present EC*Gfh1 structures differ appreciably from those in the 
previously reported structures of the core enzyme, the holoenzyme and 
ECs'*"* (Fig. 1c, d). The large conformational differences enabled us to 
redefine the rigid-body modules of T. thermophilus RNAP (Fig. 1 a, b, 
Supplementary Table 2, Supplementary Text 4, Supplementary Figs 8, 
9). These rigid modules include the ‘core’, ‘shelf, and ‘clamp’ modules, 
which generally correspond well to those in Pol II. The exceptions are 
that the active-site and dock domains belong to the ‘shelf module, 
rather than to the ‘core’ module, and the f-domain 1 (or the protrusion 
domain in Pol I) and the f flap domain (or the wall domain in Pol IT) 
are not included in the ‘core’ module in T. thermophilus RNAP 
(Supplementary Text 4, Supplementary Fig. 8). The core and shelf 
modules are the main structural elements that form the primary, 
secondary and RNA-exit channels (Fig. 1a, b, Supplementary Text 4). 
Gfh1 is accommodated in the secondary channel and its exterior region 
(Fig. la, b, Supplementary Fig. 6). The N-terminal domain (NTD) of 
Gfh1 forms a coiled coil and is inserted into the secondary channel. The 
major relative movements among the rigid-body modules are the 
‘ratcheting’ between the shelf and core modules and the ‘swinging’ of 
the clamp relative to the shelf module (Fig. 1c, d, Supplementary Movies 
1, 2), as described below in more detail. 

The clamp module is connected to the shelf module by four loops 
(switches 1, 2, 4 and 5, Supplementary Table 2). The protruded clamp 
module swings relative to the shelf module by about 15° around the 
centre near switch 5, and is further tilted by about 5° in EC*Gfh1 (Fig. 1a, 
b, Supplementary Fig. 10, Supplementary Movie 2). Consequently, 
the region of the primary channel outside the hybrid-binding site is 
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Figure 1 | Structure of EC*Gfh1. a, b, Overall structure of T. thermophilus 
ECeGfh1 in two orientations. c, d, Superposition of ECeGfh1 and EC (PDB 
205], yellow). The core modules of the two structures are superposed 
minimizing the root mean square deviation (RMSD) between Co atoms. Two 
orientations are shown. The same colour scheme is used in all figures (RNA, 


widened (Supplementary Text 5, Supplementary Fig. 11). Considering 
that Gfh1 binding to RNAP occurs on the opposite side of the clamp 
module, and that Gfh1 does not directly contact the clamp module, the 
clamp swinging seems to be related to the hairpin structure in the RNA 
of the nucleic acid scaffolds (Supplementary Text 5). 

The central body of RNAP is composed of the core and shelf modules, 
which are rotated by ~7° relative to each other in the present RNAP 
structures, as compared with the previously-reported EC state’*"* 
(Figs 1c, d, 2a, and Supplementary Movie 1). We designate this novel 
RNAP state as the ‘ratcheted’ state, and the previous EC state as the ‘tight’ 
state. The shelf module is attached to the backboard part of the core 
module (Fig. 1a), through interfaces of about 3,800 A” in the ratcheted 
state and about 3,600 A” in the tight state (Fig. 2b, c). The shelf—core 
interfaces in the two states are mostly overlapped (shown in green), but 
there are several contact points specific to either the ratcheted state or the 
tight state (shown in red and yellow, respectively). The central over- 
lapped area of the interfaces is mainly hydrophobic, and the rotation 
axis of the ratcheting runs along it (Fig. 2b, c). The ratcheting axis forms 
an angle of about 50° to the floor of the channels on the core module 
(Fig. 2a). The core and shelf modules are connected by three peptide 
segments, the bridge helix, the loop consisting of 8 Tyr 998—Met 1005 
(previously designated as ‘switch 3’)’°, and the loop consisting of B’ 
Ala779-Ser 782 (designated hereafter as the ‘hinge loop’) (Fig. 2d, 
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orange; template DNA, blue; Gfh1, purple; shelf module, cyan; core module, 
grey; clamp, green; switches 1—5, brown; hinge loop, red; other domains, dark 
grey). The three regions of the bridge helix are coloured differently (N-terminal, 
dark pink; central, hot pink; C-terminal, violet). 


Supplementary Fig. 12). As the ratcheting axis runs close to B’ Pro 781 
(Escherichia coli B’ Pro 502) within the hinge loop, the conformational 
change due to the ratcheting is negligibly small around this loop (Fig. 2e, 
f, Supplementary Fig. 12). By contrast, the bridge helix is much further 
from the ratcheting axis, as compared with the other two peptide seg- 
ments (Fig. 2b, c), and the conformational change is significant, as 
described below in more detail. 

On the other hand, the contact points specific to the ratcheted state 
are also distant from the ratcheting axis (Fig. 2b, c). In particular, an 
o-helix of the shelf module (B’ 685—696) contacts a B-hairpin from one 
of the « subunits (~B 184—191) (Fig. 2g). This contact probably limits 
the further ratcheting of the shelf module. Therefore, the ratcheted state 
is mostly at one extremity in the structural spectrum of bacterial RNAP, 
whereas the tight state observed in previous ECs'*"* should be the other 
extremity. 

The shelf module ratcheting results in a composite movement that 
expands the hybrid-binding site and shifts the shelf module forward, 
relative to the core module (Fig. 2e, f, Supplementary Text 6, Supplemen- 
tary Fig. 13). The bridge helix exposes its central part (B’ 1084-1092), 
but buries its N-terminal part (B’ 1070-1083) and the carboxy- 
terminal part (B’ 1093-1102) within the core and shelf modules, respec- 
tively. As the two modules ratchet, the N- and C-terminal parts of the 
bridge helix shift relative to each other. Consequently, the conformation 
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of B’ Thr 1088-Gly 1092 in the central part dramatically changes from 
the continuous «-helix in the previous ECs, and the two discontinuous 
a-helices are connected by the two non-helical residues, Ser 1091- 
Gly 1092 (Fig. 3a). It is intriguing that mutations of these two residues 
reportedly affect the RNAP activity’. On the other hand, the bridge 
helix is kinked in the RNAP structures of T. aquaticus core enzyme 
(%BB’w) and T. thermophilus holoenzyme (%{B'@o) without nucleic 
acids””’. However, the present kinked conformation is quite different 
from the previous ones (Supplementary Text 7 and Supplementary Fig. 
14). Here, the two o-helices that are directly connected to the trigger 
loop are packed against the bridge-helix C-terminal region, and are 
therefore shifted together with it. To avoid steric hindrance with the 
tips of these two helices, the residues B’ Thr 1088, Ala 1089, Asp 1090 
and Ser 1091 (E. coli B’ Thr 790, Ala791, Asn 792 and Ser 793) pro- 
trude into the DNA*RNA hybrid binding site (Fig. 3b, Supplementary 
Text 6, Supplementary Fig. 13). Therefore, the conformational change 
of the bridge helix is likely to occur synchronously with the transition 
to the ratcheted state (Supplementary Text 7, 8). The direct interaction 
between Gfhl and RNAP may affect the fine conformation of the 
bridge helix, by immobilizing its N-terminal region (see below). 
Within the secondary channel, the Gfhl NTD interacts tightly with 
parts of the shelf module (including the trigger loop) and the core 
module (including the secondary-channel coiled coil (B’ 958—1014) 
and the N-terminal part of the bridge helix), and fits particularly well 
with the narrowest region of the secondary channel (Fig. 3c, Supplemen- 
tary Text 9, Supplementary Fig. 15, Supplementary Movie 1). The inter- 
action between the Gfhl NTD and the N-terminal part of the bridge 
helix seems to maintain the straight conformation of the N-terminal 
part, and to define the kinking point of the bridge helix in EC*Gfh1. It 
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is impossible for the Gfhl1 NTD to bind to the secondary channel in the 
tight EC in the same manner, as the channel is too narrow (Fig. 3d). 
Leu 33 of the Gfhl NTD is located in the narrowest region of the 
secondary channel. A Gfhl mutant with Leu33 replaced by Trp 
(L33W) lacked transcription inhibition activity, probably because of 
an inability to bind (Fig. 3e, Supplementary Fig. 16). The bulky side 
chain of Trp seems to prevent the Gfhl NTD from penetrating into 
the channel. This observation also indicates that the secondary chan- 
nel cannot open beyond the width of the present ratcheted state. 
Consequently, Gfh1 just fits into the well-defined ratcheted state. 
Considering that Gfh1 cannot bind to RNAP in the tight state, because 
of steric hindrance, it is reasonable to postulate that Gfhl traps a 
dynamically occurring, ratcheted state of RNAP. To examine this 
possibility, we performed crosslinking experiments. The results showed 
that an artificial disulphide bond or photo-crosslink was formed at the 
ratcheted state-specific interface between the core and shelf modules, 
even in the absence of Gfh1 (Supplementary Text 10, Supplementary 
Figs 17—19). Therefore, RNAP might spontaneously and dynamically 
alter its conformation, from the tight state to the ratcheted state, 
although the result could also be explained by more localized con- 
formational fluctuations. 

Owing to the interaction of the Gfhl NTD with the narrowest part of 
the secondary channel, NTP entry would be prevented. Moreover, the 
presence of the Gfhl NTD is compatible with the unfolded conforma- 
tion of the trigger loop, but not with the NTP-induced, folded con- 
formation'*'* (Supplementary Fig. 20). The tip of the Gfhl NTD is 
located near the RNAP active site (Supplementary Text 11, Sup- 
plementary Fig. 21). The end of the tip loop occupies the binding site 
of the B-y phosphate groups in the NTP insertion step of the nucleotide 
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translocation state (PDB 205]) is superposed, and is coloured yellow. c, d, The 
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performed by incubating the transcription elongation complex with 5 1M Gfh1 
mutants. Error bars, s.d. (n = 5). f, A close-up view of the active site. T. 
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substrate NTP, respectively, in the nucleotide addition reaction. g, Interaction 
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addition reaction (Fig. 3f). Furthermore, Gfhl seems to stabilize the 
kinked bridge helix (Fig. 3b). An antibiotic, streptolydigin, also reportedly 
inhibited transcription by immobilizing the bridge helix in a fixed con- 
formation’’. Taken together, these observations provide the explanation 
for the inhibition of transcription elongation by Gfh1 (Supplementary 
Text 12, and Supplementary Figs 22, 23). 

On the other hand, the C-terminal domain (CTD) of Gfh1 is bound 
with the edge of the secondary-channel coiled coil of the RNAP 
(Fig. 1a, b, Supplementary Text 13, Supplementary Fig. 24). The inter- 
action involves the hydrophobic patch on the surface of the Gfhl CTD 
(Fig. 3g). Therefore, we prepared a Gfh1 protein with the M125E muta- 
tion within the hydrophobic patch, and observed a loss of inhibition 
activity (Fig. 3e). Therefore, the interaction between the hydrophobic 
patch of the Gfhl CTD and the secondary-channel coiled coil is 
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required for Gfh1 to bind to RNAP. Other Gre factors, such as GreA 
and GreB, share structural and sequence similarities with Gfh1 (refs 2-8). 
In particular, the presence of the hydrophobic patch on the CTD is well 
conserved. In fact, the M124E mutation of E. coli GreB (M125E of Gfh1) 
also reduces the transcript cleavage activity of GreB’. Therefore, the Gre 
factors seem to share a common interaction mode between the hydro- 
phobic patch and the secondary-channel coiled coil, and probably bind 
to the ratcheted EC in a similar manner to Gfh1 (Supplementary Text 13, 
14). Transcript cleavage stimulated by GreA and GreB would be per- 
formed in the ratcheted state. Although Pol II EC also changes its struc- 
ture upon TFIIS binding’*”’, the observed change is much smaller than 
the transition to the ratcheted state of T. thermophilus EC upon Gfh1 
binding (Supplementary Text 15). 

The conformational changes of RNAP observed in the present struc- 
ture, including the shelf module ratcheting and the clamp swinging, 
might have functional relevance to other stages of the transcription 
reaction, as the conformational changes should modulate the inter- 
actions of RNAP with nucleic acids. Therefore, we suggest that the 
conformational changes may play distinct roles in RNAP translocation 
and transcription termination (Supplementary Text 16, 17, Sup- 
plementary Figs 25—27). Experimental tests of these hypotheses will 
be required to assess the importance of these conformational changes 
in the absence of Gfh1. 


METHODS SUMMARY 

Structure determination. The structure of crystal 1 was solved by molecular 
replacement, using the coordinates of RNAP in T. thermophilus EC (PDB 
2051)" as the search model. There are three RNAP molecules in the asymmetric 
unit, and each RNAP is bound with Gfhl. As the relative positions of the Gfhl 
NTD and CTD differ from those in free Gfhl (PDB 2F23)’, they were separately 
placed in the electron density map. The model of the DNA*RNA hybrid was built 
in the extra electron density in the DNA*RNA hybrid channel. We further remodelled 
the coordinates of both the proteins and the nucleic acids with the program Coot”. 
Atomic positions and grouped B-factors were refined to 4.1 A, by using the CNS 
program” (Supplementary Table 1). The refinement converged to R and Rfee values 
of 26.2% and 31.8%, respectively; the latter was calculated from randomly-chosen 3% 
of reflections excluded from the refinement. The structure of crystal 2 was solved by 
molecular replacement, using the coordinates of the RNAP in crystal 1 as the search 
model. The models of Gfh1 and the DNA*RNA hybrid were built in the extra electron 
density. Refinement of the coordinates was performed to 4.3 A with CNS. The final R 
and Réee Values are 31.4% and 33.8%, respectively. 

Transcription inhibition analysis. We prepared two mutant Gfhl proteins 
(L33W and M125E). The elongation complex was reconstituted by incubating T. 
thermophilus RNAP with a nucleic acid scaffold containing template DNA, non- 
template DNA, and RNA. The RNA was 5’-radiolabelled using T4 polynucleotide 
kinase and [y-**P]-ATP. The nucleotide addition reaction was performed by incu- 
bating the transcription elongation complex in the presence of Gfh1 (the wild type 
or one of the mutants). 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Structure determination. Crystallization and data collection were described 
previously**. The data were reprocessed with the XDS program’. The structure 
for crystal 1 was solved by molecular replacement with the program Phaser”, 
using the coordinates of the core enzyme portion of T. thermophilus EC (PDB 
2051)" as the search model. The asymmetric unit contains three RNAP molecules. 
Each RNAP was divided into 25—26 rigid bodies, and their positions were refined 
with the program CNS version 1.2°°”’. Several of the rigid bodies deviated sub- 
stantially from the electron density, and they were manually adjusted to the density 
with the program Coot”. For the tip portion of the B’ non-conserved domain 
(B’ NCD, B’ 132—454), the coordinates of the T. thermophilus holoenzyme (PDB 
3DXJ)™* were used. Several rounds of rigid body refinement and manual adjust- 
ment were performed. In each RNAP molecule, extra electron density, corres- 
ponding to Gfh1, was observed. The NTD and CTD coordinates of free Gfh1 (PDB 
2F23)’ were separately placed in the 2F, — F, electron density map. One of 
the RNAP molecules in the asymmetric unit exhibited extra electron density 
corresponding to the DNA*RNA hybrid in the DNA*RNA hybrid binding site, 
for which we built the hybrid model. The coordinates of the DNA*RNA hybrid in 
S. cerevisiae EC (PDB 2VUM)” were used as the starting model. The electron 
density for the nucleic acids in the other two RNAP complexes was weak, probably 
owing to low occupancy and/or high mobility, and therefore, we did not build their 
models. 

The structures of the N- and C-terminal parts of the bridge helix in EC*Gfh1 are 
similar to those in the previous EC (2051)”’, while the central part of the bridge helix 
in the present complex exhibits a conformational change, due to the ratcheting of 
the core and shelf modules. The region of B’1086-1090 assumes a helical, but 
slightly curved conformation, and the model was built by adjusting the correspond- 
ing region in the previous EC (205I) to the electron density. Most parts of the bridge 
helix (B’ 1070—1090 and f’ 1093—1102) maintained the helical conformation. 
B’ Ser 1091 and f’ Gly 1092 were placed to link the two discontinuous helices, while 
fitting their main chains into the electron density. For this rebuilding, the position of 
B’ Tyr 1093, which was identified by the electron density of its large side chain, was 
helpful (Fig. 3a). The coordinates of both the proteins and the nucleic acids were 
further refined with the program Coot. The atomic positions and the grouped 
B-factors were refined to 4.1 A, by using CNS with strong NCS restraints among 
the three complexes in the asymmetric unit (Supplementary Table 1). Refinement 
was monitored by Rfee, calculated from 3% of the reflections that were excluded 
from the refinement. 

The structure for crystal 2 was solved by molecular replacement, using the 
coordinates of RNAP and Gfh1 in crystal 1 as the search model. A model of 
the DNA*RNA hybrid was placed in each RNAP in the electron density map, 
and positional refinement of the coordinates was performed to 4.3 A with CNS. 
For the calculation of Rg, the same reflections as those chosen for crystal 1 were 
used. 

The rigid bodies used in the structural refinement allowed us to define the 
mobile modules of T. thermophilus RNAP. We first superposed the RNAPs in 
the present ECeGfh1 and the previous EC (2051)”* by certain rigid bodies, and 
then inspected the rigid bodies that superposed well concurrently. The masses of 
the co-relocated rigid bodies were defined as modules. Finally, we confirmed that 
the defined mobile modules relocated separately by the RigidFinder program”. 
Disulphide-bonding assay for the ratcheted RNAP. We constructed a plasmid 
that allows co-expression of the , B, B’ and @ subunits of T. thermophilus RNAP 
in E. coli, for the preparation of recombinant T. thermophilus RNAP (pRpoBCAZ, 
to be published elsewhere). The recombinant RNAPs (the wild type and the « 
Q188C-f’ D685C mutant) were expressed using this system. They were purified 
by the procedure used for the natural RNAP core enzyme from T. thermophilus 
cells*®, except that the cell lysate was heat-treated at 70 °C for 30 min, in order to 
denature most of the non-thermophilic E. coli proteins. Then, the RNAPs 
were fractionated by polyethyleneimine precipitation, followed by ammonium 
sulphate precipitation. The recombinant RNAPs were further purified by chro- 
matography on Q-Sepharose and Superdex pg200 columns (GE Healthcare 
Biosciences). 

For the disulphide-bonding analysis, the recombinant RNAP (the wild type or 
the % Q188C-f’ D685C mutant) was dissolved in 75 mM Tris-HCl buffer (pH 8.1), 
containing 50mM KCl, 10mM MgCl, and 1mM DTT. Each of the RNAPs 
(0.2 uM) was incubated with 0.7uM of the nucleic acid scaffold (DNATS/ 
DNANT/RNAI4) or 10 uM of T. thermophilus Gfh1 for 30 min. Then, 2.5 mM 
glutathione disulphide (GSSG) was added to each RNAP solution for the mild 
oxidation of Cys residues. The mixtures were analysed by SDS-PAGE, using 
sample buffer lacking a reducing agent. The formation of a disulphide bond 
between « C188 and 8’ C685 was confirmed by the appearance of an extra band 
with low mobility, which corresponded to the crosslinked « and ’ subunits 
(Supplementary Fig. 17). The sequences of the nucleic acids are as follows. The 


RNA oligomer: RNA14, UUUUUGAGUCUGCGGCGAU. The DNA oligomers: 
DNATS, AACATACGGCTCGGACAGAGGTCCTGTCTGAATCGATATCGC 
CGC; DNANT, CGATTCAGACAGGACCTCTGTCCGAGCCGTATGTT. The 
nucleic acid scaffold was designed by modifying the previously reported EC14 
scaffold, which forms a stable elongation complex with T. thermophilus RNAP”. 
Photo-crosslinking assay of the ratcheted RNAP. p-Benzoyl-L-phenylalanine 
(pBpa) is a photo-crosslinker that can be position-specifically incorporated into 
a recombinant protein***’. The gene encoding a pBpa-specific variant of 
Methanococcus jannaschii tyrosyl-tRNA synthetase**’, under the control of the 
E. coli tyrS promoter, was cloned in the pACYC184 vector, together with three 
copies of the M. jannaschii amber suppressor tRNA gene”, to create a vector for 
the expression of the pBpa-specific tRNA synthetase and tRNA (ppbpaRS- 
3MJR1). Each tRNA gene had an E. coli Ipp promoter and an rrnC terminator. 
The artificial operons for overproducing minor tRNA species, including the minor 
tRNA’"?, were described previously"’, and were cloned in a kanamycin-resistant 
plasmid carrying the CloDF13-derived replication origin, to create pMINOR2. 

The rpoA gene, C-terminally tagged with FLAG, was engineered, using a 
QuikChange mutagenesis kit (Stratagene), to have an amber codon in place of 
Arg 185, for producing the RNAP «-subunit with Arg 185 replaced with pBpa 
(pBpa 185) (ref. 41). The rpoC gene was engineered to have a methionine codon 
in place of Glu 692, for producing the B’ subunit with the E692M substitution. The 
rpoA and rpoC genes in pRpoBCAZ were replaced by these mutant genes, and the 
vector was introduced into BL21 Star(DE3) cells (Invitrogen) harbouring the 
ppbpaRS-3MJR1 and pMINOR2 plasmids. The cells were grown in LB medium 
containing 1 mM pBpa, and the gene expression was induced by the addition of 
1mM IPTG at the mid-log phase. After a further 4-h incubation, the cells were 
harvested and lysed by sonication in buffer A (40 mM Tris-HCl (pH 7.7), 500 mM 
NaCl, 10mM EDTA, 10mM 2-mercaptoethanol, 5% glycerol, and Complete 
protease inhibitor cocktail tablets (Roche)). The wild-type and engineered 
RNAP core enzymes were roughly purified by heat-treatment. For the photo- 
crosslinking, the proteins were exposed to light at 365 nm for 30 min on ice*! in 
a 24-well cell culture plate (BD Biosciences), followed by SDS-PAGE, Coomassie 
brilliant blue staining, and western blotting with an anti-FLAG antibody (Sigma) 
(Supplementary Fig. 19a, 19b). The RNAP core enzymes were further purified as 
described above. Then, 2 1M RNAP (wild type or mutant) was incubated with the 
nucleic acid scaffold (2.5 uM) or Gfh1 (10 1M) for 30 min at room temperature in 
50 mM HEPES-NaOH buffer (pH 7.5), containing 50 mM KCl, 10 mM MgCl and 
1mM DTT, followed by the photo-crosslinking step (Supplementary Fig. 19c). 
Transcription inhibition analysis. In the previous study, we constructed a plasmid 
for the expression of wild-type Gfh1 in E. coli**. The expression plasmids for Gfh1 
variants (L33W and M125E) were generated by introducing mutations to the plasmid 
encoding wild-type Gfh1. In addition, we constructed plasmids for the expression 
of wild type and mutant T. thermophilus GreA in E. coli. The wild-type and mutant 
Gre proteins were expressed as described previously™, and were then purified by 
chromatography on Toyopearl Super-Q and Butyl columns (Tosoh Bioscience). 
The transcription elongation complex was reconstituted by incubating 0.1 .M of 
T. thermophilus RNAP with 0.1 |\M of the nucleic acid scaffold (DNATS/DNANT/ 
RNA14 or RNA15) for 30 min, where the sequence of RNA15 is UUUUUG 
AGUCUGCGGCGAUDA. The RNA was 5’-radiolabelled using T4 polynucleotide 
kinase and [y-*’P]-ATP. The nucleotide addition reaction was performed by 
incubating the transcription elongation complex of RNA14 with 5 uM Gfh1 (wild 
type or a mutant) at 20°C in 50mM MES-NaOH buffer (pH 6.5), containing 
50mM KCl, 10 mM MgCl, 1mM DTT and 20 uM ATP and that of RNA15 with 
2.5 tM Gfh1 (wild type or a mutant) at 55°C in 50 mM MES-NaOH buffer (pH 
6.5), containing 50 mM KCl, 1 mM MgCl, 1 mM DTT and 20 pM UTP. The RNA 
was analysed by denaturing (8 M urea) PAGE. 
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A substantial population of low-mass stars in 
luminous elliptical galaxies 


Pieter G. van Dokkum! & Charlie Conroy”? 


The stellar initial mass function (IMF) describes the mass distri- 
bution of stars at the time of their formation and is of fundamental 
importance for many areas of astrophysics. The IMF is reasonably 
well constrained in the disk of the Milky Way’ but we have very 
little direct information on the form of the IMF in other galaxies 
and at earlier cosmic epochs. Here we report observations of the 
Nat doublet”* and the Wing-Ford molecular FeH band** in the 
spectra of elliptical galaxies. These lines are strong in stars with 
masses less than 0.3Mo (where Mo is the mass of the Sun) and are 
weak or absent in all other types of stars*’. We unambiguously 
detect both signatures, consistent with previous studies* that were 
based on data of lower signal-to-noise ratio. The direct detection of 
the light of low-mass stars implies that they are very abundant in 
elliptical galaxies, making up over 80% of the total number of stars 
and contributing more than 60% of the total stellar mass. We infer 
that the IMF in massive star-forming galaxies in the early Universe 
produced many more low-mass stars than the IMF in the Milky 
Way disk, and was probably slightly steeper than the Salpeter form? 
in the mass range 0.1Mo to 1Mo. 

We obtained spectra of eight of the most luminous and massive 
galaxies in the nearby Universe: four of the brightest early-type galaxies 
in the Virgo cluster and four in the Coma cluster. The galaxies were 
selected to have velocity dispersions o >250km s |, and were 
observed with the Low-Resolution Imaging Spectrometer’? (LRIS) 
on the Keck telescope. In 2009 the red arm of LRIS was outfitted with 
fully depleted charge-coupled devices (CCDs), which have excellent 
sensitivity out to wavelengths of 2 > 9,000 A and almost no fringing. 
The individual spectra of the four galaxies in each of the two clusters 
were de-redshifted, averaged and binned to a resolution of 8 A. 

In Fig. Ib and c we show the spectral region near the 1 = 8,183, 
2 = 8,195 Na! doublet for the Virgo and Coma galaxies. The doublet 
appears as a single absorption feature due to Doppler broadening. In 
Fig. le and f we show the region around the 4 = 9,916 Wing-Ford band 
for the Virgo galaxies. This region could not be observed with sufficient 
signal in Coma because it is redshifted to 1.015 um. The spectra are of 
very high quality. The median 1o scatter of the four galaxies around the 
average spectrum is only about 0.3% per spectral bin. The median 
absolute difference between the Virgo and Coma spectra is 0.4% per 
spectral bin. 

Both the Na! doublet and the Wing-Ford band are unambiguously 
detected. The central wavelength of the observed Na1 line coincides 
with the weighted average wavelength of the doublet and the observed 
Wing-—Ford band has the characteristic asymmetric profile reflecting 
the A*A to X‘A transition of FeH (ref. 5). The Nai index is 
0.058 + 0.006 mag in the Virgo galaxies and 0.057 + 0.007 mag in 
Coma. The Wing-Ford index in Virgo galaxies is 0.027 + 0.005. The 
uncertainties are determined from the scatter among the individual 
galaxies. We note that any residual systematic problems with the 
detector or atmosphere are incorporated in this scatter, as the features 
were originally redshifted to a different observed wavelength range for 
each of the galaxies. 


The immediate implication is that stars with masses less than 0.3M 5 
are present in substantial numbers in the central regions of elliptical 
galaxies. Such low-mass stars are impossible to detect individually in 
external galaxies, because they are too faint: Barnard’s star would have 
a K-band magnitude of 39 at the distance of the Virgo cluster. This in 
turn implies that there was a channel for forming low-mass stars in the 
progenitors of luminous early-type galaxies in clusters. These star- 
forming progenitors are thought to be relatively compact galaxies at 
z= 2-5 with star-formation rates of tens or hundreds of solar masses 
per year. Some studies have suggested truncated IMFs for such galaxies", 
with a cut-off below 1M. Such dwarfless IMFs are effectively ruled out 
by the detection of the Na1 lines and the Wing-Ford band. 

We turn to stellar population synthesis models to quantify the 
number of low-mass stars in elliptical galaxies. As discussed in detail 
in the Supplementary Information, we use a flexible stellar population 
synthesis code’* combined with an extensive empirical library of stellar 
near-infrared spectra’. In Fig. 1 we show synthetic spectra in both 
spectral regions for different choices of the IMF'"*, including IMFs 
that are steeper than the Salpeter form. Away from the Na I doublet 
and the Wing-Ford band all models fit very well, with differences 
between data and the best-fitting model of less than 0.5% over the 
entire spectral range. Predicted Na I and Wing-Ford line indices are 
compared to the observed values in Fig. 2. The data prefer IMFs with 
substantial dwarf populations. The best fits are obtained for a logarithmic 
IMF slope of x ~ —3, a more dwarf-rich (‘bottom-heavy’) IMF than even 
the Salpeter form, which has x = —2.35. A Kroupa IMF (which is appro- 
priate for the Milky Way) is inconsistent with the Wing-Ford data at 
>26 and inconsistent with the NaI data at >40, as are IMFs with even 
more suppressed dwarf populations'***. We note that the x = —3 IMF 
also provides a much better fit to the region around 0.845 jim than any 
of the other forms. Taking the Salpeter IMF as a limiting case, we find 
that stars with masses of 0.1M« to 0.3M« make up at least 80% of the 
total number of living stars in elliptical galaxies, and contribute at least 
60% of the total stellar mass. 

Although the formal uncertainty in the derived IMF slope is small 
we stress that some unknown systematic effect could be present in the 
stellar population synthesis modelling. In particular, weak features in 
the spectra of giant stars in elliptical galaxies may be incorrectly repre- 
sented by the Milky Way giants that we use. It may also be that the Na 
abundance of low-mass stars in elliptical galaxies is different from that 
of low-mass stars in the Milky Way. The fact that all models fit the 
spectra of both the Virgo and Coma galaxies extremely well outside of 
the IMF-sensitive regions gives some confidence in our approach; as 
we show in the Supplementary Information, the quality of the fit 
constrains possible contamination of spectral features such as TiO 
lines. 

Besides model uncertainties, the interpretation may be complicated 
by the fact that we are constraining the IMF some ten billion years after 
the stars were formed. It is now generally thought that elliptical galaxies 
have undergone several (or many) mergers with other galaxies after 
their initial collapse’’, which may imply that the stellar population is 
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Figure 1 | Detection of the Nat doublet and the Wing-Ford band. a, Spectra 
in the vicinity of the 2 = 8,183, 2 = 8,195 Nat doublet for three stars from the 
IRTF library’*: a KO giant, which dominates the light of old stellar populations; 
an M6 dwarf, the (small) contribution of which to the integrated light is 
sensitive to the form of the IMF at low masses; and an M3 giant, which has 
potentially contaminating TiO spectral features in this wavelength range. 

b, Averaged Keck/LRIS spectra of NGC 4261, NGC 4374, NGC 4472 and 
NGC 4649 in the Virgo cluster (black line) and NGC 4840, NGC 4926, IC 3976 
and NGC 4889 in the Coma cluster (grey line). Four exposures of 180s were 
obtained for each galaxy. The one-dimensional spectra were extracted from the 
reduced two-dimensional data by summing the central 4”, which corresponds 
to about 0.4 kpc at the distance of Virgo and about 1.8 kpc at the distance of 
Coma. We found little or no dependence of the results on the choice of aperture. 


Coloured lines show stellar population synthesis models for a dwarf-deficient 
‘bottom-light’ IMF", a dwarf-rich ‘bottom-heavy’ IMF with x = —3, and an 
even more dwarf-rich IMF. The models are for an age of 10 Gyr and were 
smoothed to the average velocity dispersion of the galaxies. The x = —3 IMF 
fits the spectrum remarkably well. c, Spectra and models around the dwarf- 
sensitive Na1 doublet. A Kroupa IMF, which is appropriate for the Milky Way, 
does not produce a sufficient number of low-mass stars to explain the strength 
of the absorption. An IMF steeper than Salpeter appears to be needed. 

d-f, Spectra and models near the 2 = 9,916 Wing-Ford band. The observed 
Wing-Ford band also favours an IMF that is more abundant in low-mass stars 
than the Salpeter IMF. All spectra and models were normalized by fitting low- 
order polynomials (excluding the feature of interest). The polynomials were 
quadratic in a, b, d and e and linear in c and f. 
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Figure 2 | Constraining the IMF. a, Various stellar IMFs, ranging from a 
‘bottom-light’ IMF with strongly suppressed dwarf formation" (light blue) to 
an extremely ‘bottom-heavy’ IMF with a slope x = —3.5. The IMFs are 
normalized at 1M, because stars of approximately one solar mass dominate 
the light of elliptical galaxies. b, Comparison of predicted line Nal and Wing- 
Ford indices with the observed values. The indices were defined to be analogous 
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to those in refs 4 and 8. The Na! index has central wavelength 0.8195 jum and 
side bands at 0.816 tm and 0.825 jum. The Wing-Ford index has central 
wavelength 0.992 um and side bands at 0.985 um and 0.998 1m. The central 
bands and side bands are all 20 A wide. Both observed line indices are much 
stronger than expected for a Kroupa IMF. The best fits are obtained for IMFs 
that are slightly steeper than Salpeter. 
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more complex than our single-age, single-metallicity model. On the 
other hand, these accretion events probably mostly add stars at large 
radii’* and may not have affected the core regions very much. It will be 
interesting to search for gradients in dwarf-sensitive features with 
more extensive data’. 

Our results are consistent with previous studies of the near-infrared 
spectra of elliptical galaxies*”°'. They are also consistent with recent 
dynamical and lensing constraints on the IMF in elliptical galaxies with 
large velocity dispersions” and directly identify the stars that are 
responsible for their high masses: the dynamical data cannot distin- 
guish dwarf-rich IMFs from dwarf-deficient IMFs because the latter 
have a large amount of mass in stellar remnants'*. A steep IMF for 
elliptical galaxies is also qualitatively consistent with the apparently 
higher number of low-mass stars in the Milky Way bulge than in the 
disk**, Our best-fitting IMF does not appear to be consistent with the 
observed colour and M/L evolution of massive cluster galaxies”, which 
suggest an IMF with a slope x ~ —1 around 1Mo. Interpreting the 
evolution of the colours and luminosities of elliptical galaxies relies on 
the assumption that these galaxies evolve in a self-similar way, which 
may not be valid’***. It could also be that the form of the IMF is more 
complex than a power law. 

Our results also seem inconsistent with theoretical arguments for 
dwarf-deficient IMFs at high redshift, which have centred on the idea 
that the characteristic mass of stars scales with the Jeans mass in 
molecular clouds*?*. The Jeans mass has a strong temperature 
dependence and it has been argued that relatively high ambient tem- 
peratures in high-redshift star-forming galaxies may have set a lower 
boundary to the characteristic mass in the progenitors of elliptical 
galaxies'***. However, the Jeans mass also scales with density, and 
the gas densities in the star-forming progenitors of the cores of elliptical 
galaxies were almost certainly significantly higher than typical densities 
of star-forming regions in the Milky Way. Numerical simulations sug- 
gest that the formation of low-mass stars becomes inevitable if suffi- 
ciently high densities are reached on sub-parsec scales*’. Furthermore, 
recent semi-analytic models of the thermal evolution of gas clouds have 
emphasized the effects of dust-induced cooling”*’, which is relatively 
insensitive to the ambient temperature and particularly effective at high 
densities. Timescale arguments suggest that the physical conditions 
expected in starburst galaxies at high redshift might even enhance 
low-mass star formation, rather than suppress it”. 

Taken at face value, our results imply that the form of the IMF is not 
universal but depends on the prevailing physical conditions: Kroupa- 
like in quiet, star-forming disks and dwarf-rich in the progenitors of 
massive elliptical galaxies. This informs models of star formation and 
has important implications for the interpretation of observations of 
galaxies in the early Universe. The stellar masses and star-formation 
rates of distant galaxies are usually estimated from their luminosities, 
assuming some form of the IMF”*. Our results suggest that a different 
form should be used for different galaxies, greatly complicating the 
analysis. The bottom-heavy IMF advocated here may also require a 
relatively low fraction of dark matter within the central regions of 
nearby massive galaxies”. 
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The mechanism of sodium and substrate release from 
the binding pocket of vSGLT 


Akira Watanabel*, Seungho Choe”*, Vincent Chaptal!, John M. Rosenberg”, Ernest M. Wright’, Michael Grabe*? & Jeff Abramson! 


Membrane co-transport proteins that use a five-helix inverted 
repeat motif have recently emerged as one of the largest structural 
classes of secondary active transporters’. However, despite many 
structural advances there is no clear evidence of how ion and sub- 
strate transport are coupled. Here we report a comprehensive study 
of the sodium/galactose transporter from Vibrio parahaemolyticus 
(vSGLT), consisting of molecular dynamics simulations, biochemical 
characterization and a new crystal structure of the inward-open con- 
formation at a resolution of 2.7 A. Our data show that sodium exit 
causes a reorientation of transmembrane helix 1 that opens an inner 
gate required for substrate exit, and also triggers minor rigid-body 
movements in two sets of transmembrane helical bundles. This 
cascade of events, initiated by sodium release, ensures proper timing 
of ion and substrate release. Once set in motion, these molecular 
changes weaken substrate binding to the transporter and allow galac- 
tose readily to enter the intracellular space. Additionally, we identify 
an allosteric pathway between the sodium-binding sites, the unwound 
portion of transmembrane helix 1 and the substrate-binding site that 
is essential in the coupling of co-transport. 

Secondary active transporters harness the energy stored in electro- 
chemical gradients to drive the accumulation of specific solutes across 
cell membranes. This task is accomplished by the alternating-access 
mechanism, in which the substrate-binding site is first exposed to one 
side of the membrane and, on ion and substrate binding, a conforma- 
tional change exposes the transported solute to the opposite face, 
where it is released’. Sodium/glucose co-transporters are prototypes 
of secondary active transporters that drive the accumulation of sugars 
and other molecules into cells. These transporters have critical roles in 
human physiology, where mutations in their genes are responsible for 
severe congenital diseases* and are the molecular targets for drugs to 
treat diabetes and obesity”. 

There has been a recent surge of work on crystal structures®"' dis- 
playing the five-helix inverted repeat motif. These are referred to as the 
LeuT superfamily and include genetically diverse proteins that transport 
a wide range of substrates and differ in the number and type of driving 
ligand’, A general model for alternating access is being pieced together 
through comparisons of these diverse structures'*’*. Despite sharing a 
common set of ten core transmembrane segments, the lack of sequence 
similarity and the chemical diversity of the transported substrates pre- 
vents the complete understanding of the mechanistic basis of transport. 
This hurdle is being surmounted as multiple structures of the same 
protein—at different stages in the transport cycle—are solved, providing 
a comprehensive understanding of substrate binding’"*""° and the trans- 
ition from outward- to inward-facing conformations'’. However, an 
atomic-level understanding of sodium-coupled substrate co-transport, 
necessary to explain the dynamics of alternating access, is still absent. 

To investigate the mechanism of sodium-sugar coupling, we carried 
out molecular dynamics simulations on the galactose-bound inward- 
occluded conformation of vVSGLT® embedded in a lipid bilayer’’. All 
sodium co-transporters of the LeuT superfamily share a common 


sodium-binding site termed the Na2 site. During the transition to 
the inward-facing conformation, transmembrane helix (TM) 8, which 
forms part of the sodium-binding site, is displaced by ~4 A, generating 
a less favourable Na2 site that facilitates Na* release!” (Fig. 1b). Na* 
modelled at this site is loosely coordinated by the carbonyl oxygens of 
Tle 65 (3.3 A), Ala361 (3.2 A) and the side-chain hydroxyl of Ser 365 
(3.1 A). The carbonyl oxygen of Ala62 (3.6 A) and the side-chain 
hydroxyl of Ser 364 (3.6 A) are also in close proximity (Fig. 1b). 
Previous molecular dynamics simulations performed on vSGLT””’ 
and Mhp1” indicated that Na‘ quickly leaves the Na2 site. Our simu- 
lations indicate that Na* exits the Na2 site after 9 ns (Fig. 2a) and 
interacts with the hydrophilic pore-lining residue Asp 189 on TM5 
during exit. The importance of Asp 189 was highlighted in a previous 
simulation” and in biochemical studies on hSGLT1’°. All three 
molecular dynamics simulations indicate that Na* exits the trans- 
porter before substrate exit; however, additional conformational 
changes are required to release the occluded galactose. 

In the inward-occluded structure, galactose is located halfway across 
the membrane (Figs 1a and 2b), where it is coordinated by extensive 
side-chain interactions from TM1, TM2, TM6, TM7 and TM 10. Subsets 
of these residues form two hydrophobic gates blocking galactose exit to 
the intracellular and extracellular spaces. Our molecular dynamics 
simulations show that as Na* exits the Na2 site, galactose undergoes 
significant fluctuations within the binding pocket. At 52 ns, Tyr 263 
adopts a new and stable rotamer conformation that expands the exit 
pathway (between 52 and 110ns), permitting the sugar to leave the 
binding site (Figs 2 and 3). After sugar release (~110ns), Tyr 263 
returns to the original conformation. 

To test the hypothesis that Na* release stimulates an alternative con- 
formation of Tyr 263, we conducted a 200-ns molecular dynamics simu- 
lation in which the sodium was lightly restrained in the Na2 site. Under 
these conditions, Tyr 263 never adopts the alternative conformation, and 
thereby prevents galactose exit (Supplementary Fig. 1). This observation 
suggests that sodium release drives conformational changes that disrupt 
the galactose-binding site, and further suggests that interactions between 
the Na2 site and Tyr 263 are central to the transport mechanism. 

The spontaneous release of galactose in the absence of applied forces 
makes possible the accurate determination of the binding free energy 
profile through the use of umbrella sampling along the exit pathway 
coupled with weighted histogram analysis! (Fig. 3, inset). After Na" 
release, galactose is weakly bound to vSGLT with a minimal energy 
barrier of ~2 kcal mol ', resulting from the interaction of the sugar 
with residues Asn 64, Ser 66, Glu68 and Gln 69 on TM1. Asn 64 is of 
particular interest because it is located in the unwound segment of 
TM1 and has hydrogen bonds with the inner gate residue Tyr 263 and 
the O2 hydroxyl of galactose linking the Na2 site with the galactose 
site. Thus, the interactions of Asn 64 with Tyr 263 and galactose may 
be critical to the transport mechanism’. 

To test the importance of these interactions, we performed molecu- 
lar dynamics simulations and sodium-dependent transport assays on 
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Figure 1 | Structures and overlay of the inward-open and inward-occluded 
conformations. a, The core domain of the inward-open conformation (TM1- 
TM10) is coloured by specific helix bundles involved in the transition from the 
inward-occluded to the inward-open conformation. The ‘hash motif formed 
from TM3, TM4, TM8 and TM9 is blue; the ‘sugar bundle’ formed from TM2, 
TM6 and TM7 is green; TM1 is red; and TM5 and TM10 are magenta. The 
periphery helices (TM—1, TM11, TM12 and TM13) are yellow. Atoms are 
displayed in ball-and-stick form with oxygen coloured red and nitrogen 
coloured blue. Inset, an overlay of the inward-open (colour) and inward- 
occluded (grey) conformations illustrating the coordination at the Na2 and 
galactose-binding sites. b, c, Overlay of the inward-open and inward-occluded 
conformations with the same colouring as in a. Conformational changes in the 
inward-open structure reveals a ~13° kink in the unwound segment of TM1 
that prevents sodium coordination at the Na2 site (b). In the absence of 
galactose, the galactose-binding residue Asn 64 hydrogen-bonds to Glu 88 and 
Tyr 263, maintaining an open pathway from the intracellular space to the 
substrate-binding site (c). 


a » i 


transporters with mutations at positions 64 and 263. Simulations of the 
Asn 64 Ala mutant show a momentary sodium departure from the Na2 
site at 5 ns, but the ion rapidly returns and remains for the remainder of 
the simulation. The failure of Na* to unbind prevents conformational 
changes in the unwound segment of TM1, and Tyr 263 remains in the 
blocked orientation (Supplementary Fig. 2a). In agreement with the 
simulation, sodium-dependent transport assays on the Asn 64 Ala 
mutant show no activity (Fig. 2c). 

To explore the role of Asn 64 further, we tested Asn 64 Gln and 
Asn 64 Ser, which, in principal, are both capable of maintaining the 
native hydrogen bonds to Tyr 263 and galactose. Models of Asn 64 Gln 
prevented simulation as the result of substantial steric clashes, which 
correlated well with a lack of transport (Fig. 2c). The model of the 
Asn 64 Ser mutation could form a hydrogen bond with Tyr 263 (3.3 A) 
but not with the O2 hydroxyl of galactose (4.3 A). In the simulation, 
Asn 64 Ser releases Na’ from the Na2 site at 25 ns, and Tyr 263 tran- 
siently adopts the alternative rotamer conformation before returning 
to its original position, preventing galactose exit (Supplementary Fig. 
2c). Similarly, simulation of the Tyr 263 Phe mutant shows that Na* 
unbinds at 10 ns, but unlike tyrosine, phenylalanine never adopts a 
conformation compatible with galactose exit (Supplementary Fig. 2b). 
Longer simulations may reveal galactose release in these mutants, 
because they both show modest transport activity (~ 10% of wild type); 
however, the transport assays and simulation data demonstrate that 
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Figure 2 | Mechanism of galactose release. a, Sodium and galactose exit 
vSGLT. The root mean squared deviation (r.m.s.d.) of Na* (green) rapidly 
increases at 9 ns, indicating exit from the Na2 site. This is followed by the release 
of galactose (red) at 110 ns. b, Tyr 263 adopts two rotamers. On the left, Tyr 263 
is shown in the conformation observed in the inward-occluded structure’, in 
which it blocks substrate exit through a hydrogen bond with Asn 64. on TM1. At 
52 ns (shown on the right), Tyr 263 adopts a rotamer conformation that 
expands the exit pathway. c, D-galactose uptake by wild-type and vSGLT 
mutants in proteoliposomes. Results are expressed as percentage uptake in 
either 100 mM NaCl or KCl, and show that the mutants Asn 64 Ala, Asn 64 Ser, 
Asn 64 Gln and Tyr 263 Phe severely impair sodium-dependent transport. 
Error bars, s.e.m. WT, wild type. 


robust transport requires precise orientation of Asn 64 to stabilize 
galactose and the gating residue Tyr 263. 

Although the molecular dynamics simulations and biochemical 
studies demonstrate a physical link between the Na2 site and the 
substrate, global details regarding the inward-open conformation 
(devoid of both ligands) remain elusive. To address this issue, we 
determined the structure of VSGLT in the inward-open conformation. 
Crystals, in the absence of ligands, for both the wild-type protein and 
the inactive Lys 294 Ala mutant° were obtained. Both crystals had the 
same overall configuration, but the mutant crystals diffracted to a 
higher resolution (2.7 A; see Methods). 

As in the original structure’, the inward-open conformation is com- 
posed of 14 transmembrane helices, ten of which comprise the core 
domain. TM1-TM5 and TM6-TM10 are related by an approximate 
two-fold symmetry axis through the centre of the membrane plane. 
The inward-occluded and the inward-open structures have a similar 
overall fold with a r.m.s.d. of 1.2 A. However, there are distinct struc- 
tural differences between the two conformations, presumably owing to 
changes resulting from the release of ligands (Fig. 4). With the excep- 
tion of TM1, superimpositions of individual helices reveal that the 
occluded-to-open transition occurs by rigid-body movements of sub- 
domains (Supplementary Figs 3 and 4). Consistent with the recent 
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Figure 3 | The potential of mean force for galactose unbinding. Energy of 
galactose binding to vSGLT in the absence of Na’. Umbrella sampling along 
the natural, equilibrium pathway shown (inset) was used to determine the 
binding free energy. The distance along the pathway from the binding site in the 
X-ray structure is shown along the x axis. The coloured arrows correspond to 
the galactose positions shown in the inset. The largest barrier is ~2 kcal mol ', 
at 5 A, which corresponds to galactose interaction with residues in the kink 
region of TM1. Error bars were determined by splitting the production data 
into four equal sets, computing the energy profile for each set, and then 
applying a global shift to each curve before calculating the standard deviation at 
the 16 positions marked with points. 


assignment for Mhp1”, the hash motifs, formed from TM3 and TM4 
and their inverted repeat equivalents, TM8 and TM9, align with a 
r.m.s.d. of 0.9 A. TM2, TM6 and TM7 form a domain termed the sugar 
bundle for the extensive side-chain interactions with galactose, and 
these regions superimpose with a r.m.s.d. of 0.5 A. This new inward- 
open structure of vSGLT is more similar to the recent inward-facing 
conformation structure of Mhp1 than is the previous structure of 
vSGLT. Details of this structural analysis are in Methods. 

The transition from the inward-occluded to the inward-open struc- 
ture is presumably triggered by sodium release from the Na2 site and 
the alteration in the hydrogen-bonding network surrounding the 
unwound segment of TM1. In particular, the intracellular half of 
TMI flexes ~13°, modifying the coordination of Asn64 (Figs 1b 
and 4a). In the absence of both galactose and Na *, Asn 64 coordinates 
Tyr 263 and Glu 88. Glu 88 was previously hydrogen-bonded to the O2 
and O03 hydroxyls of galactose. This new conformation of TM1 is 
further stabilized by hydrogen bonds between the Na2-site residue 
Ser 365 and Glu68 on the unwound segment of TM1 (Supplemen- 
tary Fig. 5). When viewed from the intracellular side, each domain 
moves ~3° in opposite directions, thereby increasing the volume of the 
accessibility cavity by ~1,400 A® (Fig. 4). This 6° relative rotation 
probably disrupts protein-substrate coordination and permits water 
to enter the site. This interpretation is supported by our simulations, in 
which an increase in the number of water molecules in the substrate- 
binding site is observed after sodium release (Supplementary Fig. 6). 
Water effectively competes with the protein for hydrogen bonds, loos- 
ening galactose in the pocket and ultimately assisting in its release 
(Supplementary Fig. 6 and Supplementary Movie 1). 

We propose the following mechanism for sodium and galactose exit 
from vSGLT. The transition from the outward- to the inward- 
occluded conformation weakens the Na2 sodium-binding site, causing 
it to become metastable and release the ion on a short timescale. Upon 
exit, the carbonyl oxygens of the ion-coordinating residues Ile 65 and 
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Figure 4 | Conformational changes in the transition from the inward- 
occluded to the inward-open structure. a, TM1 superimposed between the 
inward-open (red) and inward-occluded (grey) structures, showing a ~13° 
kink in TM1. b, Overlay of the inward-open (coloured as in Fig. 1) and inward- 
occluded (grey) conformations. Rigid-body rotations of the hash motif and 
sugar bundle by 3° in opposite directions expose the substrate-binding site to 
the intracellular environment. c, Accessibility cavity of the inward-occluded 
conformation is coloured blue. d, Accessibility cavity of the inward-open 
conformation is coloured gold. The conformational changes from TM1, hash 
motif and sugar bundle cause an increase of ~1,400 A? in the accessible volume 
of the inward-open conformation, aiding galactose release. 


Ala 62 undergo a conformational change in the unwound segment of 
TMI, producing a kink of ~13° (Figs 1 and 4a). Our simulation shows 
that movement of TM1 disrupts the hydrogen bond between Asn 64 
and Tyr 263, allowing the side chain of Tyr 263 to adopt a new con- 
formation that opens a pathway to the intracellular space (Fig. 2b). 
Additional rigid-body movements widen the intracellular cavity, 
allowing water penetration and further disrupting the substrate-binding 
site to enhance exit and prevent rebinding (Fig. 4). 

It is likely that the reaction scheme described here for vSGLT is 
broadly used by all sodium-dependent members of the LeuT super- 
family, because the Na2 site, the hydrophobic gates and the unwound 
segments on TM1 and TM6 are all conserved""'”*”*, For proteins with 
a single sodium-binding site, the Na2 site directly interacts with the 
substrate through polar residues—Asn 64 in vSGLT and Gln 42 in 
Mhp1—located on the unwound segment of TM1. For proteins that 
harbour two sodium-binding sites, such as LeuT and, putatively, BetP, 
the additional site (the Nal site) is positioned on the opposite side of 
the unwound helix from the Na2 site. Interactions between the Nal 
and Naz? sites are mediated by the unwound segment of TM1, and the 
sodium at the Nal site is directly coordinated to the substrate*’’. 
Regardless of whether the protein has one or two sodium-binding sites, 
it is the conserved Na2 site, positioned most distal from the core of the 
protein, that regulates sodium and substrate release. This primary 
structural feature coupling sodium and substrate co-transport has 
fundamental implications for our understanding of membrane protein 
biology and for developing strategies to manipulate the alternating- 
access mechanism therapeutically. 
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METHODS SUMMARY 

Molecular dynamics simulations. The VSGLT monomer (Protein Data Bank ID, 
3DH4) was embedded and solvated in a 1-palmitoyl-2-oleoyl phosphatidylcholine 
membrane bilayer using the OPM** and CHARMM-GUI" software packages. 
Simulations were carried out using NAMD* with the CHARMM27 parameter 
set in a 150 mM NaCl bath. See Methods for more details. 

Protein expression and purification. Plasmids carrying wild-type or mutant 
transporters were transformed and overexpressed in the TOP10 Escherichia coli cell 
line. Cell membranes were isolated, solubilized (2% w/v decyl-B-p-maltopyranoside) 
and tandem-purified using a Ni-NTA Superflow column (affinity chromatography) 
and a Superdex 200 column (size-exclusion chromatography). See Methods for 
more details. 


Transport assays. We generated proteoliposomes by reconstituting purified 
vSGLT protein with sonicated lipid at a protein/lipid ratio of 1:200. We measured 
transport activity by monitoring the uptake of p-galactose, with '*C-p-galactose 
tracer, into proteoliposomes in the presence or absence of a 100 mM Na* gradient 
(K* replacing Na‘). See Methods for more details. 

Crystallization and data collection. We concentrated purified wild-type and 
Lys 294 Ala protein to ~13mgml~! and grew crystals by the hanging-drop 
vapour diffusion method using the Mosquito nanolitre-dispensing robot. Data 
collected at the Advanced Light Source, Berkeley (beamline 5.0.2), were integrated 
and scaled, and phases were calculated by molecular replacement. The model was 
built and refined to an Ryork/Riree Value of 25.1/27.4. See Methods for more details. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Molecular dynamics simulations. Initially, TM—1 was removed and six missing 
residues in the TM4-TM5 loop were added with the loop modelling routine in 
Modeller’’. Residues 53-547 were then embedded in a membrane and solvated in 
a hexagonal box approximately 96 X 96 x 84 A® in volume for a total of 63,000 
atoms. Electroneutrality was enforced with the addition of 150 mM NaCl. 

Simulations were carried out with the CMAP corrected?” CHARMM27 para- 
meter set and the TIP3P water model. VMD** and MATLAB were used for 
visualization and analysis. The system was minimized using conjugate gradient 
minimization and heated to 310K using Langevin dynamics with a 10-ps ‘ 
damping coefficient. An initial 300-ps equilibration using the NVT ensemble 
was carried out in which water, galactose, Na‘ and heavy backbone and side-chain 
atoms were constrained in a harmonic potential with a force constant of 
k= 10.0kcalmol”!A~*. We then switched to an NPT ensemble, and the 
restraints on the water molecules and heavy side-chain atoms were gradually 
removed in five steps over 1.5 ns. All remaining restraints were removed in six 
steps over the next 1.8 ns. Finally, 10 ns of restraint-free simulation was run. All 
production runs start from this equilibrated system. A Langevin piston with a 200-fs 
period and 100-fs decay was used to set the pressure to 1 atm. Hydrogen bond 
lengths were constrained with SHAKE”, and a 2-fs time step was used. A 10 Avan 
der Waals cut-off was used along with the particle mesh Ewald method for the 
electrostatics. 

Simulations with the restrained ion were carried out by holding the Na* ina weak 
spherical harmonic potential using the distance from the Na“ to the centre of mass 
(COM) of the of the five coordinating residues. The equilibrium distance, based on the 
inward-occluded structure, was 1.39 A, anda force constant ofk = 0.5 kcal mol! A~? 
produced minimal distortions in the protein. A 10-ns equilibration was run before a 
200-ns production run. The r.m.s.d. of the unrestrained simulation was 2.5 A for the 
entire protein and 3.0 A for the restrained-Na* simulation. 

Potential of mean force calculation. The potential of mean force (PMF) was 
calculated using umbrella sampling with WHAM”". We extracted 69 snapshots 
along the pathway and held each configuration in a harmonic potential 
(k =7.0kcal mol’ A~*) with a resting length equal to the z component of the 
distance between the galactose COM and the binding-site COM defined by the 
binding-site residues. Two nanosecond trajectories were run for each umbrella, 
and the last 1,800 ps were used for calculating the PMF. Splitting the trajectories 
into two equal parts (200-1,000 ps and 1,200-2,000 ps) and computing separate 
PMFs revealed that the total PMF is well converged. 

Protein purification. vSGLT proteins were cloned, expressed and purified as 
previously described****. Briefly, the plasmids were transformed into the 
TOP10 cell line expressed to ODgo9 1.8 and induced with 0.66 mM L-arabinose 
for 4h at 29°C. Cell membranes were isolated, solubilized with 2% decyl-B-p- 
maltopyranoside and affinity-purified on a Ni-NTA column. The sample was 
further purified by size-exclusion chromatography (Superdex 200) and washed 
with crystal buffer (20mM Tris (pH 7.5), 25mM NaCl, 0.174% decyl-B-p- 
maltopyranoside) in a 50-kDa Amicon filter unit. 

Transport assays. Mutants were created with the QuikChange method and puri- 
fied as above. vSGLT protein was reconstituted in 150mM KCl, 10mM Tris/ 
Hepes (pH 8.0), 1mM DTT, 1mM Na,EDTA, 1mM CaCl, 1mM MgCl, and 
0.5% decyl-B-p-maltopyranoside, with 1.2 mg ml! sonicated lipid (90 mg asolectin 
soy lecithin, 10 mg cholesterol) at a protein/lipid ratio of 1:200. Addition of 5 mg ml’ 
SM-2 Bio-Beads initiated the reconstitution and the mixture was incubated overnight 
at 4°C. The proteoliposomes were collected and washed twice by centrifugation. 
Pelleted proteoliposomes were resuspended and underwent three freeze-thaw cycles 
in liquid nitrogen. 

Uptake of p-galactose (88 1M) with '*C-p-galactose tracer into proteolipo- 
somes was measured for 18 min at 22 °C in the presence or absence of 100 mM 
Na* (K* replacing Na”) as described previously®. Proteoliposomes were col- 
lected by filtration through 0.45-t1m Millipore filters and the uptake was quantified 
by scintillation counting. Results are expressed as the mean + s.e.m. of three 
determinations and three trials. 

Crystallization. Protein was concentrated to ~13mgml ' before plating. 
Optimization by additive screening gave the best diffracting crystals with a reservoir 
solution containing 0.1M MES (pH 6.5), 4% MPD and 9-13% PEG400, and 
tridecyl-f-p-maltopyranoside to a final concentration of 0.0017% as an additive. 


LETTER 


Before freezing, crystals were cryoprotected using a solution containing 30% 
PEG400 and 0.174% decyl-f-p-maltopyranoside. 

Data processing, phasing and refinement. Data was collected at 1.0 Aon cryo- 
cooled crystal (100 K) at the Advanced Light Source (beamline 5.0.2). Five data sets 
were integrated using HKL2000” and merged and subjected to B-factor-sharpening 
using an anisotropy correction server"! (resolution cut-offs: a= 3.1A, b=2.7A 
and c=2.8A). Phases were calculated by molecular replacement (PHASER’**) 
using the original vSGLT structure as a search model. The model was built in 
COOT” and refined using PHENIX” and BUSTER* using non-crystallographic 
symmetry (NCS) and TLS refinement restraints. There are two molecules per 
asymmetric unit with the A molecule displaying sharper electron density and lower 
B factors (88.5 A”) than the B molecule (131.3 A”). The model was built and refined 
to an Ryorn/Reee Value of 25.1/27.4. The Ramachandran statistics shown areas 
follows: 95.5% of the residues lie in the preferred region, 4.3% lie in the allowed 
region and 0.2% are outliers. 

The 2F,-F, maps contained three elongated features having a maximal peak 
height of 30. These attributes were interpreted and assigned as PEG molecules. 
Two are located at the periphery, whereas the third is near the Na2 site as observed 
in the Mhp] structure’’ and is proposed to stabilize the inward-facing conformation. 

The Lys 294 Ala protein crystals diffract to higher resolution than the wild-type 
crystals. Data from four wild-type crystals were collected and merged to achieve a 
3.7 A resolution data set. Difference Fourier maps were calculated against the final 
Lys 294 Ala mutant model and no significant peaks were observed. The 
Lys 294 Ala model was further refined using PHENIX to yield an Rwor/Réree Value 
of 30.7/34.8. 

We note that although refinement was carried out with data subject to aniso- 
tropic correction, as described above, the deposited data has not been treated. 
Figures were created from the A-chain protomer using PYMOL”. 

Structural comparison of vVSGLT with Mhp1. Superpositions of the inward- 
occluded (Protein Data Bank ID, 3DH4) and inward-open conformations of 
vSGLT with the inward-facing conformation of Mhp1 (Protein Data Bank ID, 
2X79) reveal they all share a similar global fold. The largest differences are centred 
near the substrate- and ion-binding sites. The Na2-site helices (TM1, TM5 and 
TM8) of the inward-open conformation of vSGLT have a closer fit to Mhp1 
(r.m.s.d., 2.2 A) than the inward-occluded conformation (r.m.s.d., 2.6 A); thus, 
the inward-open vSGLT structure more closely resembles the structure of Mhp1. 
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Medulloblastoma encompasses a collection of clinically and mol- 
ecularly diverse tumour subtypes that together comprise the most 
common malignant childhood brain tumour’. These tumours are 
thought to arise within the cerebellum, with approximately 25% 
originating from granule neuron precursor cells (GNPCs) after 
aberrant activation of the Sonic Hedgehog pathway (hereafter, 
SHH subtype)* *. The pathological processes that drive heterogeneity 
among the other medulloblastoma subtypes are not known, hinder- 
ing the development of much needed new therapies. Here we provide 
evidence that a discrete subtype of medulloblastoma that contains 
activating mutations in the WNT pathway effector CTNNB1 (here- 
after, WNT subtype)'** arises outside the cerebellum from cells of 
the dorsal brainstem. We found that genes marking human WNT- 
subtype medulloblastomas are more frequently expressed in the 
lower rhombic lip (LRL) and embryonic dorsal brainstem than in 
the upper rhombic lip (URL) and developing cerebellum. Magnetic 
resonance imaging (MRI) and intra-operative reports showed that 
human WNT-subtype tumours infiltrate the dorsal brainstem, 
whereas SHH-subtype tumours are located within the cerebellar 
hemispheres. Activating mutations in Ctnnb1 had little impact on 
progenitor cell populations in the cerebellum, but caused the abnor- 
mal accumulation of cells on the embryonic dorsal brainstem which 
included aberrantly proliferating Zicl* precursor cells. These 
lesions persisted in all mutant adult mice; moreover, in 15% of cases 
in which Tp53 was concurrently deleted, they progressed to form 
medulloblastomas that recapitulated the anatomy and gene expres- 
sion profiles of human WNT-subtype medulloblastoma. We provide 
the first evidence, to our knowledge, that subtypes of medulloblas- 
toma have distinct cellular origins. Our data provide an explanation 
for the marked molecular and clinical differences between SHH- 
and WNT-subtype medulloblastomas and have profound implica- 
tions for future research and treatment of this important childhood 
cancer. 

SHH-subtype medulloblastoma is characterized by aberrant SHH 
signalling that is often driven by inactivating mutations in PTCH1**. 
These medulloblastomas tend to arise in very young children, display a 
‘large cell-anaplastic’ or “desmoplastic’ histology and have a relatively 
poor prognosis” *. WNT-subtype medulloblastomas are strikingly dif- 
ferent. Arising in much older children, these highly curable tumours 
have ‘classic’ morphology and activating mutations in CTNNBI'*. 
Mouse models have shown that SHH-subtype medulloblastomas arise 


from committed GNPCs of the cerebellum’* and enabled the develop- 
ment of new therapies that suppress the oncogenic SHH-signal”"®. It has 
been suggested that the other medulloblastoma subtypes might have a 
different cellular origin*'"”’, but little is known about their biology and 
there are no mouse models of these tumours. 

Recently, we showed that subtypes of the brain tumour ependymoma 
arise from discrete populations of neural progenitor cells with which 
they share similar gene expression profiles’’. Therefore, to determine if 
medulloblastoma subtypes also arise from discrete cell populations, we 
first used four online gene expression databases to chart the regional 
expression of 110 genes that mark human SHH- or WNT-subtype 
medulloblastomas’. Twenty-four WNT-subtype and 25 SHH-subtype 
medulloblastoma signature genes are contained within “Brain Explorer 
2’, which generates three-dimensional gene expression maps across the 
mouse brain (www.brain-map.org, Supplementary Methods and 
Supplementary Data set 1). As expected’, these data revealed the 
URL at embryonic day (E) 11.5 and the cerebellum at E15.5 to be the 
most common sites of SHH-subtype signature gene expression (Fig. la, 
b and Supplementary Data Set 1). In contrast, WNT-subtype medullo- 
blastoma signature genes were predominantly expressed within the 
LRL at E11.5 (rhombomeres (r) 2-r8) and the dorsal brainstem at 
E15.5. Expression of an additional 61 medulloblastoma signature genes, 
reported by three other online databases, confirmed this differential 
pattern (Supplementary Fig. 1 and Supplementary Table 1). These data 
suggest that SHH- and WNT-subtype medulloblastomas arise from 
distinct regions of the hindbrain and identify the dorsal brainstem as 
a potential source of WNT-subtype tumours. 

If SHH- and WNT-subtype medulloblastomas have different origins, 
we reasoned that these tumours should demonstrate anatomical differ- 
ences at diagnosis. Remarkably, all validated WNT-subtype medullo- 
blastomas examined (n = 6/6, Supplementary Fig. 2) were located 
within the IV ventricle and infiltrated the dorsal surface of the brain- 
stem, whereas all SHH-subtype tumours (n = 6/6) were distributed 
away from the brainstem within the cerebellar hemispheres (Fig. 1c, d 
and Supplementary Fig. 3, exact Mann-Whitney P < 0.005). Five of the 
six WNT-subtype, but no SHH-subtype, tumours were adherent to the 
dorsal brainstem at surgery (Fisher’s exact test, P< 0.005). Thus WNT- 
subtype medulloblastomas are anatomically distinct from SHH tumours 
and are intimately related to the IV ventricle and dorsal brainstem. 

We noted various cell types surrounding the IV ventricle that 
could give rise to WNT-subtype medulloblastomas, including dorsal 
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brainstem progenitors of cochlear, mossy-fibre and climbing-fibre 
neurons (Fig. 1a, b and Supplementary Fig. 4)’. But it remained pos- 
sible that cerebellar ventricular-zone radial glia’*’® or GNPCs generate 
WNT-subtype medulloblastomas. To identify hindbrain cells that are 
susceptible to transformation by Ctnnb1, we generated mice carrying a 
cre-dependent mutant allele of Ctnnb1 (Ctnnb1'°)”” and the Blbp- 
Cre transgene’®. Blbp-Cre induces efficient recombination in progenitor 
cell populations across the hindbrain including the cerebellar ventricular 
zone, GNPCs of the external germinal layer (EGL) and Olig3* progeni- 
tor cells in the LRL’’ (Supplementary Fig. 5). We also generated Blbp- 
Cre‘! ;Ctnnb1*/’“) (hereafter, Ctnnbl-mutant) mice that were 
homozygous for a cre-dependent mutant allele of Tp53 (Tp53)”° 
because loss of this tumour suppressor accelerates medulloblastoma 
formation in Ptch1*’~ mice”. As expected, Ctnnb1-mutant embryos 
expressed mutant nuclear-Ctnnb1 in all hindbrain germinal zones, 
regardless of Tp53 status (Supplementary Figs 5k and 6). Surprisingly, 
mutation of Ctnnb1 did not affect significantly the proliferation or 
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Figure 1 | WNT and SHH subtypes of medulloblastoma are anatomically 
distinct. a, b, Expression distribution in (a) E11.5 and (b) E15.5 mouse 
hindbrain of orthologues that distinguish human WNT- and SHH-subtype 
medulloblastoma (Supplementary Data Set 1). Cartoons in b denote the 
position of rhombomeres relative to the cerebellum and brainstem. c, Top, pre- 
operative, and bottom, post-operative, MRI scans of exemplary SHH- and 
WNT-subtype medulloblastomas. Right panels show close-up views of left 
panels. Brainstem (BSt), post-operative tumour cavity (cvt). d, Frequency and 
site of post-operative surgical cavities of SHH- (n = 6) and WNT- (n = 6) 
subtype medulloblastomas. Axial (left) and sagittal (right) views are shown. 


2 | NATURE | VOL 000 | 00 MONTH 2010 


apoptosis of ventricular-zone cells or GNPCs in the cerebellum 
(Fig. 2a and Supplementary Fig. 7). 

Because GNPCs generate SHH-subtype medulloblastomas”*, we 
sought additional evidence that these cells are not impacted by mutant 
Ctnnbl. First, we generated Atoh1 -Cre*’~;Ctnnb1 +") mice 
because Atoh1-Cre drives efficient recombination in GNPCs, generat- 
ing medulloblastomas in conditional Ptch1 mice (see Supplementary 
Fig. 8a-j and ref. 7). We also used the Atoh1 enhancer element present 
in the Atoh1-Cre allele to drive expression of a constitutively active 
Ctnnb1-green fluorescence fusion protein (GFP) in GNPCs (Atoh1- 
Ctnnb14N7°S"*, Supplementary Fig. 8k-o)”. Neither Atoh1-Cre*’; 
Ctnnb1*"°*) nor Atoh1-Ctnnb 148°"? mice (more than 20 mice 
examined each) developed hyperplasia or masses within the URL or 
EGL. Concordantly, aberrant Ctnnb1 signalling did not impact the 
proliferation of GNPCs ex vivo (Supplementary Fig. 8p). Thus, in 
contrast to aberrant Shh signalling, mutant Ctnnb1 does not appear 
to disrupt cell-cycle or differentiation control in GNPCs. 

In stark contrast to the cerebellum, by E16.5 all Ctnnb1-mutant mice 
developed aberrant cell collections in the dorsal brainstem that per- 
sisted into adulthood (exact Mann-Whitney P< 0.005, Fig. 2a-f). 
These cells were marked by Olig3 and Pax6, which suggested they 
may be derived from progenitor cells within the LRL'’”’ (Fig. 2d, e). 
This abnormality was independent of Tp53 status and did not involve 
the floor plate that is not targeted by Blbp-Cre (Supplementary Fig. 9). 
Progenitors within the embryonic dorsal brainstem proliferate to 
produce daughter cells that express specific marker proteins and follow 
complex migration streams to their respective nuclei in the developing 
brainstem (Supplementary Fig. 4)'°. We observed no significant 
differences in the overall proliferation (Ki67 labelling), apoptosis 
(TdT-mediated dUTP nick end labelling) or cell-cycle duration (5- 
bromo-2-deoxyuridine pulse-chase) of progenitors in the dorsal brain- 
stem of Ctnnb1-mutant versus control mice (Fig. 2c, data not shown). 
However, a significant fraction of proliferating cells within Ctnnb1- 
mutant dorsal brainstems expressed Zicl (37% Zicl* /Ki67* = 122/ 
322; Fig. 2c, f-h). This expression is aberrant because Zicl normally 
marks postmitotic mossy-fibre neuron precursors as they exit the dor- 
sal brainstem to form nuclei in the ventral brainstem” (Fig. 2g). Thus 
mutant Ctnnb1 might stall the dorso-ventral migration of brainstem 
neuron precursors, resulting in aberrant dorsal cell collections”®. To test 
this, we used in utero GFP electroporation to track the fate of embry- 
onic dorsal brainstem precursors (Fig. 2i-q and Supplementary Figs 10 
and 11). GFP-labelled Zicl* mossy-fibre neuron precursors under- 
went normal migration from the dorsal brainstem to the pontine grey 
nucleus and other brainstem nuclei in control mice (Fig. 2k-n and 
Supplementary Fig. 11). In contrast, mutation of Ctnnb1 markedly 
reduced the numbers of precursors transiting from the dorsal brain- 
stem to the pontine grey nucleus (Fig. 20-q; exact Mann-Whitney, 
P<0.05). Together, these data demonstrate that mutant Ctnnb1 dis- 
rupts the normal differentiation and migration of progenitor cells on 
the dorsal brainstem, resulting in the accumulation of aberrant cell 
collections. These cells may include stalled mossy-fibre neuron precur- 
sors, but further work is required to determine their precise lineage. 

Aberrant cell collections in the dorsal brainstem of Ctnnb1-mutant 
mice are reminiscent of the EGL hyperplasia that precedes formation of 
SHH-subtype medulloblastoma in the cerebellum of Ptch1-deficient 
mice*’. Therefore we aged Ctnnb1-mutant mice harbouring Tp53*’*, 
Tp53*"™ or Tp53"™ alleles to test if WNT-subtype medulloblastomas 
might arise from the dorsal brainstem (n more than 54 mice per geno- 
type). Aberrant cell collections persisted throughout adulthood on the 
dorsal brainstem of all Ctnnb1 -mutant;Tp53"’ * mice, but these animals 
did not develop medulloblastoma or tumours in any part of the 
hindbrain (median follow up 365 days). In contrast, 2 out of 10 
Ctnnb1-mutant;Tp53”™ mice aged older than 6 months harboured 
asymptomatic tumours that were confined to the dorsal brainstem (Sup- 
plementary Fig. 12). When aged for longer periods, 15% (n = 8/55) of 
Ctnnb1-mutant;Tp53™ MX and 4% (n = 2/54) Ctnnb1-mutant;Tp53°™ 
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Figure 2 | Mutant-Ctnnb1 causes aberrant accumulation of LRL cells. 

a, Low- (scale bar, 180 tm) and b, high- (scale bar, 50 um) power views of LRL/ 
dorsal brainstem in Ctnnb1 mutant and wild-type E16.5 embryos; b includes 
the corresponding adult brainstem region. c, Volume and indicated 
immunoreactivity differences between Cinnb1I-mutant and wild-type LRL 

(n = 3 mice per group; bars, mean + s.d.). Immunofluorescence of Olig3 

(d), Pax6 (e) and Zicl (f) in Ctnnb1-mutant E16.5 LRL (left) and aberrant adult 
dorsal brainstem masses (right) (scale bar, 180 1m). Inset, high-power views of 
“ (scale bar, 5 jim). g, Postmitotic mossy-fibre precursor neurons 


mice developed ‘classic’ medulloblastomas that were Zicl* and con- 
tained populations of nuclear-Ctnnb1*/Olig3* cells (median follow 
up 290 and 287 days, respectively; Fig. 3a-d). These mouse medullo- 
blastomas displayed an immunoprofile similar to human WNT-subtype 
tumours and were invariably connected with the brainstem (Fig. 3d, e 
and Supplementary Fig. 13). In contrast, mouse models of human SHH- 
subtype medulloblastoma*’”””* are nuclear-Ctnnb1 negative, arise 
within the cerebellum and do not invade the brainstem (Fig. 3d, e). 
Together, these data support the hypothesis that progenitor cells within 
the dorsal brainstem are susceptible to transformation by concurrent 
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(Zicl */Ki67_ ) exit the proliferating E16.5 control LRL. h, Ctnnbl-mutant LRL 
contains aberrant proliferating Zicl* precursors (arrows; scale bar, 50 um). 

i, GFP-electroporated wild-type LRL marks Olig3” cells (j) and migrating 
precursors (arrows in i) that include Zic1* mossy-fibre neurons (k) that form 
the pontine grey nucleus (1). GFP-fluorescence of whole (m, o) and sectioned 
(n, p) Ctnnb1-mutant and wild-type PO hindbrains electroporated at E12.5. 
q, Mean + s.d. of LRL/pontine grey nucleus GFP fluorescence in whole 
hindbrains of three BIBp-Cre;Ctnnb1 */* and five Blbp-Cre;Ctnnb1 MORES 
mice (graphs; *P =< 0.05, **P < 0.005, exact Mann-Whitney). 


mutation in Ctnnb1 and Tp53, resulting in the formation of tumours that 
mimic the anatomical features of human WNT-subtype medulloblas- 
toma. Deletion of Tp53 is presumably required to allow key second 
mutations during transformation of the LRL in Ctnnb1-mutant mice. 
Notably, we have observed two cases of TP53-mutant human WNT- 
subtype medulloblastoma, suggesting this gene also suppresses these 
tumours in humans (Supplementary Fig. 14). 

To test further the fidelity of Ctnnb1-mutant mouse medulloblas- 
toma as a model of human WNT-subtype disease, we compared the 
tumour transcriptomes in the two species using an algorithm we have 
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Figure 3 | Mutant-CtnnbI and SHH-subtype mouse medulloblastomas are 
anatomically distinct. a, Tumour-free survival of SHH-subtype 
medulloblastoma mouse models (Nes-Cre*/ “Liga 3Tp53 ~~ Nes- 

Cre“! ;Xreca™"",T p53, Ptch1*'~sInk4c /~, Ptch1 p53, data from 
refs 14, 27, 28) and Ctnnb1-mutant;Tp53™ “ and Ctnnb1-mutant; Tp53 +a mice, 
**T og rank P< 0.0001. Immunofluorescence of (b) Zicl and (c) Olig3 and 
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Ctnnb1 expression in a Ctnnb1-mutant;Tp53"™ f< medulloblastoma. 

d, Haematoxylin and eosin-stained low- (i, v; scale bar, 800 jum) and high- (ii, vi; 
scale bar, 25 Lum) power views of mouse medulloblastomas and tumour-brainstem 
interface (iii, vii; scale bar, 50 tm). Ctnnb1 immunostaining (iv, viii; scale bar, 

10 jim, arrows indicate nuclear immunoreactivity). Boxes indicate location of 
high-power views. e, Frequency and anatomical site of mouse medulloblastomas. 
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Figure 4 | Mutant-Ctnnb1 mouse medulloblastomas recapitulate the 
molecular characteristics of human WNT-subtype disease. a, AGDEX 
comparison of Ctnnb1-mutant;Tp53" x/fl< mouse medulloblastoma, and mouse 
EGL, E16.5 dorsal brainstem (DBS) and human medulloblastoma subgroups. 
b, Unsupervised clustering of human WNT- and SHH-subtype 
medulloblastoma signature orthologue expression in E16.5 DBS, Ctnnb1- 
mutant; Tp53"™ mouse medulloblastoma (Ctnnb1 MB), P7 GNPCs and 
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developed for cross-species genomic comparisons'*. Remarkably, the 
transcriptome (n = 11,049 orthologues) of Ctnnb1-mutant;Tp53"" 
medulloblastomas matched only human WNT-subtype medulloblas- 
toma and the cells of the embryonic dorsal brainstem (both permuted 
P<0.05), validating it as a model of this human tumour subtype and 
further pinpointing the brainstem as the source of WNT-subtype 
medulloblastomas (Fig. 4a, b). Finally, because human WNT-subtype 
medulloblastomas selectively delete chromosome 6 (ref. 3), we looked 
in Ctnnb1-mutant mouse medulloblastomas to see if syntenic regions 
of this chromosome are deleted (Fig. 4c). DNA microarray analysis 
identified a single common deletion of mouse chromosome 17 3.2 cM/ 
human 6q25.3 in tumours in the two species. This locus encodes a 
single gene, TULP4, that is a distant member of the tubby-gene family 
implicated in regulating neuronal cell apoptosis”. Thus Ctnnb1- 
mutant;Tp53™ mouse medulloblastomas accurately model the 
molecular characteristics of human WNT-subtype tumours and pin- 
point TULP4 as a novel candidate suppressor gene of these tumours. 
By demonstrating that subtypes of medulloblastoma have distinct cel- 
lular origins, our data should significantly accelerate the hunt for 
curative treatments of these diseases, which must now account for 
the different developmental origins of these tumours. 


METHODS SUMMARY 

MRI analysis. MRI images of patients were spatially normalized into a standard 
stereotaxic space for quantitative comparison of tumour location (SPM5; www.fil. 
ion.ucl.ac.uk/spm). Radiologists masked to patient subtype determined the three- 
dimensional location of the tumour or surgical cavity relative to pre-defined ana- 
tomical landmarks. 

Expression mapping. The expression of mouse orthologues of key signature 
genes of human WNT- and SHH-subtype medulloblastoma (Supplementary 
Data Set 1 and Supplementary Table) were mapped in the developing mouse 
hindbrain using four publically accessible data sets (see Supplementary Methods). 
Mouse studies. Blbp-Cre, Ctnnb 1°99), Atoh1-Cre and Tp53"™ mice 
were bred to generate appropriate genotypes and subjected to clinical surveillance 
for signs of tumour development. RosaYFP and RosaLacz reporter strains traced 
the lineage of Cre-recombined cells. Mouse tumours comprised at least 85% 
tumour cells. Atoh1-Ctnnb1“” transgenic mice were generated by pro-nuclear 
injection. In utero electroporation and cell tracking were performed by anaesthet- 
izing pregnant mice of the appropriate genotype. The uterus was externalized and 
the dorsal brainstem of E12.5 embryos electroporated with CMV-eGFP plasmid 
DNA. GNPCs for culture studies were isolated from postnatal day 7 Atoh1-GFP 
transgenic mice. GFP* cells (2 X 10° per well) were cultured in poly-p-lysine- 
coated 96-well plates and challenged with mutant-Ctnnb1-GFP, control GFP virus, 
Wntl protein (50 ng ml ') or Shh supernatant (3 Lg ml ') before pulsing with 
{methyl-3H]thymidine and scintillation counting. 

Histology, messenger RNA and DNA microarray profiling. Immuno- 
histochemistry was performed using routine techniques and primary antibodies 
of the appropriate tissues as described (Supplementary Methods). Cells under- 
going apoptosis were detected with the Apoptag kit (Millipore, $7100). Messenger 
RNA expression (GEO accession number GSE24628) and DNA copy number pro- 
files (available at http://stjuderesearch.org/site/authors/gilbertson) were generated 
from mouse and human tissues using appropriate microarray platforms as detailed 
(Supplementary Methods). Reverse transcriptase real-time PCR and gene re- 
sequencing of human medulloblastomas were performed as described previously’. 
Messenger RNA expression and DNA microarray profiles of human and mouse 
medulloblastomas were integrated using established and novel bioinformatic and 
statistical approaches. 
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Individuals make choices and prioritize goals using complex processes that assign value to rewards and associated 
stimuli. During Pavlovian learning, previously neutral stimuli that predict rewards can acquire motivational 
properties, becoming attractive and desirable incentive stimuli. However, whether a cue acts solely as a predictor of 
reward, or also serves as an incentive stimulus, differs between individuals. Thus, individuals vary in the degree to which 
cues bias choice and potentially promote maladaptive behaviour. Here we use rats that differ in the incentive 
motivational properties they attribute to food cues to probe the role of the neurotransmitter dopamine in stimulus- 
reward learning. We show that intact dopamine transmission is not required for all forms of learning in which reward 
cues become effective predictors. Rather, dopamine acts selectively in a form of stimulus—reward learning in which 
incentive salience is assigned to reward cues. In individuals with a propensity for this form of learning, reward cues come 
to powerfully motivate and control behaviour. This work provides insight into the neurobiology of a form of stimulus- 
reward learning that confers increased susceptibility to disorders of impulse control. 


Dopamine is central for reward-related processes’’, but the exact 
nature of its role remains controversial. Phasic neurotransmission 
in the mesolimbic dopamine system is initially triggered by the receipt 
of reward (unconditional stimulus, US), but shifts to a cue that pre- 
dicts a reward (conditional stimulus, CS) after associative learning**. 
Dopamine responsiveness appears to encode discrepancies between 
rewards received and those predicted, consistent with a ‘prediction 
error’ teaching signal used in formal models of reinforcement learn- 
ing’®. Therefore, a popular hypothesis is that dopamine is used to 
update the predictive value of stimuli during associative learning’. In 
contrast, others have argued that the role of dopamine in reward is in 
attributing Pavlovian incentive value to cues that signal reward, 
rendering them desirable in their own right*"’, and thereby increas- 
ing the pool of positive stimuli that have motivational control over 
behaviour. Until now it has been difficult to determine whether dopa- 
mine mediates the predictive or the motivational properties of 
reward-associated cues, because these two features are often acquired 
together. However, the extent to which a predictor of reward acquires 
incentive value differs between individuals, providing the opportunity 
to parse the role of dopamine in stimulus-reward learning. 
Individual variation in behavioural responses to reward-associated 
stimuli can be seen using one of the simplest reward paradigms, 
Pavlovian conditioning. If a CS is presented immediately before US 
delivery at a separate location, some animals approach and engage the 
CS itself and go to the location of food delivery only upon CS ter- 
mination. This conditional response (CR), which is maintained by 
Pavlovian contingency”, is called ‘sign-tracking’ because animals are 
attracted to the cue or sign that indicates impending reward delivery. 
However, other individuals do not approach the CS, but during its 
presentation engage the location of US delivery, even though the US is 
not available until CS termination. This CR is called ‘goal-tracking’’. 
The CS is an effective predictor in animals that learn either a sign- 
tracking or a goal-tracking response; it acts as an excitor, evoking a CR 


in both. However, only in sign-trackers is the CS an attractive incentive 
stimulus, and only in sign-trackers is it strongly desired (that is, ‘wanted’), 
in the sense that animals will work avidly to get it'*. In rats selectively bred 
for differences in locomotor responses to a novel environment’’, 
high responders to novelty (bHR rats) consistently learn a sign-tracking 
CR but low responders to novelty (bLR rats) consistently learn a goal- 
tracking CR'®. Here, we exploit these predictable phenotypes in the 
selectively bred rats, as well as normal variation in outbred rats, to probe 
the role of dopamine transmission in stimulus—reward learning in indi- 
viduals that vary in the incentive value they assign to reward cues. 


Stimulus—reward learning 

bHR and bLR rats from the twentieth generation of selective breeding 
(S20) were used for behavioural analysis of Pavlovian conditional 
approach behaviour’® (Fig. la-e). When presentation of a lever-CS 
was paired with food delivery both bHR and bLR rats developed a 
Pavlovian CR, but as we have described previously'’, the topography 
of the CR was different in the two groups. With training, bHR rats 
came to rapidly approach and engage the lever-CS (Fig. la, b), 
whereas upon CS presentation bLR rats came to rapidly approach 
and engage the location where food would be delivered (Fig. 1c and 
d; see detailed statistics in Supplementary Information). Both bHR 
and bLR rats acquired their respective CRs as a function of training, 
given that there was a significant effect of number of sessions for all 
measures of sign-tracking behaviour for bHR rats (Fig. la, b; 
P=0.0001), and of goal-tracking behaviour for bLR rats (Fig. 1c, d; 
P =0.0001). Furthermore, bHR and bLR rats learned their respective 
CRs at the same rate, as indicated by analyses of variance in which 
session was treated as a continuous variable and the phenotypes were 
directly compared. There were non-significant phenotype X session 
interactions for (1) the number of contacts with the lever-CS for bHR 
rats versus the food-tray for bLR rats (Fi, 236) = 3.02, P = 0.08) and 
(2) the latency to approach the lever-CS for bHR rats versus the 


Molecular and Behavioral Neuroscience Institute, University of Michigan, Michigan, USA. Department of Psychiatry and Behavioral Sciences and Department of Pharmacology, University of Washington, 


Washington, USA. *Department of Psychology, University of Michigan, Michigan, USA. 
*These authors contributed equally to this manuscript. 


00 MONTH 2010 | VOL 000 | NATURE | 1 


©2010 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


a Sign-tracking 


Cc Goal-tracking e 


« 1004 & 
5 = 
g 3 
cc fo} 
S ° 
oO ‘A. 


Latency (s) 


4 fet TIT 111 1- 


246 81012 246 81012 
Session Session 

Figure 1 | Development of sign-tracking versus goal-tracking CRs in bHR 
and bLR rats. Behaviour directed towards the lever-CS (sign-tracking) is shown 
inaand b and behaviour directed towards the food-tray (goal-tracking) is shown 
in c and d (m= 10 per group). Data are shown as mean + s.e.m. a, Number of 
lever-CS contacts made during the 8-s CS period. b, Latency to the first lever-CS 
contact. c, Number of food-tray beam breaks during lever-CS presentation. 
d, Latency to the first beam break in the food-tray during lever-CS presentation. 
For all of these measures (a—d) there was a significant effect of phenotype, 
session, and a phenotype X session interaction (P = 0.0001). e, Probability of 


food-tray for bLR rats (Fi, 236) = 0.93, P = 0.34). Importantly, rats 
that received non-contiguous (pseudorandom) presentations of the 
CS and the US did not learn either a sign-tracking or a goal-tracking 
CR (Fig. le). 

These data indicate that the CS acquired one defining property of 
an incentive stimulus in bHR rats but not bLR rats: the ability to 
attract. Another feature of an incentive stimulus is to be ‘wanted’ 
and as such animals should work to obtain it''”. Therefore, we quan- 
tified the ability of the lever-CS to serve as a conditioned reinforcer in 
the two groups (Fig. 1f, g) in the absence of the food-US. Following 
Pavlovian training, rats were given the opportunity to perform an 
instrumental response (a nosepoke) for presentation of the lever- 
CS. Responses into a port designated ‘active’ resulted in the brief 
presentation of the lever-CS and responses into an ‘inactive’ port were 
without consequence. Both conditioned bHR and bLR rats made 
more active than inactive nose pokes, and more active nose pokes 
than control groups that received pseudorandom presentations of the 
CS and the US (Fig. 1f, g; detailed statistics in Supplementary 
Information). However, the lever-CS was a more effective condi- 
tioned reinforcer in bHR rats than in bLR rats, as indicated by a 
significant phenotype X group interaction for active nose pokes 
(Fu, 33) = 4.82, P = 0.04), which controls for basal differences in nose- 
poke responding. Moreover, in outbred rats in which this baseline 
difference in responding does not exist, we have found similar results, 
indicating that the lever-CS is a more effective conditioned reinforcer 
for sign-trackers than goal-trackers'*. In summary, the lever-CS was 
equally predictive, evoking a CR in both groups, but it acquired two 
properties of an incentive stimulus to a greater degree in bHR rats 
than bLR rats: it was more attractive, as indicated by approach beha- 
viour (Fig. la) and more desirable, as indicated by its ability to serve as 
a conditioned reinforcer (Fig. 1f, g). 


Dopamine signalling during stimulus—reward learning 

The core of the nucleus accumbens is an important anatomical sub- 
strate for motivated behaviour’*”’ and has been specifically implicated 
as a site where dopamine acts to mediate the acquisition and/or per- 
formance of Pavlovian conditional approach behaviour*® ”. Therefore, 
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approach to the lever minus the probability of approach to the food-tray shown 
as mean+ s.e.m. A score of zero indicates that neither approach to the lever-CS 
nor approach to the food-tray was dominant. f, g, Test for conditioned 
reinforcement illustrated as the number (mean + s.e.m.) of active and inactive 
nosepokes in bred rats that received either paired (bHR rats, n = 10; bLR rats, 
n= 9) or pseudorandom (bHR rats, n = 9; bLR rats, n = 9) CS-US 
presentations. Rats in the paired groups poked more in the active port than did 
random groups of the same phenotype (*P < 0.02), but the magnitude of this 
effect was greater for bHR rats (phenotype < group interaction, P = 0.04). 


we used fast-scan cyclic voltammetry (FSCV) at carbon-fibre micro- 
electrodes™ to characterize the pattern of phasic dopamine signalling 
in this region during Pavlovian conditioning (see Supplementary Fig. 
1 for recording locations). Similarly to surgically naive animals, bHR 
rats learned a sign-tracking CR (session effect on lever contacts: 
Fis, 20) = 5.76, P = 0.002) and bLR rats learned a goal-tracking CR 
(session effect on food-receptacle contacts: Fis, 29) = 5.18, P = 0.003) 
during neurochemical data collection (Supplementary Fig. 2). 
Changes in latency during learning were very similar in each group 
for their respective CRs (main effect of session: Fis, 49) = 10.5, 
P<0.0001; main effect of phenotype: Fi, g) = 0.13, P = 0.73; session 
X phenotype interaction: F(s, 49) = 1.16, P = 0.35), indicating that the 
CS acts as an equivalent predictor of reward in both groups. Therefore, 
if CS-evoked dopamine release encodes the strength of the reward 
prediction, as previously postulated*’, it should increase to a similar 
degree in both groups during learning; however, if it encodes the 
attribution of incentive value to the CS, then it should increase to a 
greater degree in sign-trackers than in goal-trackers. During the 
acquisition of conditional approach, CS-evoked dopamine release 
(Fig. 2 and Supplementary Fig. 3) increased in bHR rats relative to 
unpaired controls (pairing X session interaction: Fis, 35) = 4.58, 
P= 0.003), but there was no such effect in bLR rats (Fig. 2 and 
Supplementary Fig. 3; pairing < session interaction: Fis, 35) = 0.94, 
P= 0.46). Indeed, the trial-by-trial correlation between CS-evoked 
dopamine release and trial number was significant for bHR rats 
(? = 0.14, P<0.0001) but not bLR rats (77 = 0.003, P= 0.54), 
producing significantly different slopes (P = 0.005) and higher CS- 
evoked dopamine release in bHR rats after acquisition (Supplemen- 
tary Fig. 4, session 6; P = 0.04). US-evoked dopamine release also 
differed between bHR and bLR rats during training (session X pheno- 
type interaction: Fis, 49) = 6.09, P= 0.0003), but for this stimulus 
dopamine release was lower after acquisition in bHR rats (session 6; 
P= 0.002; Supplementary Fig. 4). Collectively, these data highlight 
that bHR and bLR rats produce fundamentally different patterns of 
dopamine release in response to reward-related stimuli during learn- 
ing (see Supplementary Videos 1 and 2). The CS and US signals 
diverge in bHR rats (stimulus X session interaction: F(s, 49) = 5.47, 
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P= 0.0006; Fig. 2c) but not bLR rats (stimulus < session interaction: 
F., 49) = 0.28, P = 0.92; Fig. 2f). 

Importantly, experiments conducted in commercially obtained 
outbred rats reproduced the pattern of dopamine release observed in 
the selectively bred rats (Fig. 3 and Supplementary Fig. 4). Specifically, 
there was an increase in CS-evoked and a decrease in US-evoked dopa- 
mine release during learning in outbred rats that learned a sign-tracking 
CR (stimulus X session interaction: F(s, 59) = 4.43, P = 0.002; Fig. 3d), 
but not in those that learned a goal-tracking CR (stimulus * session 
interaction: F(s, 49) = 0.48, P = 0.72; Fig. 3f). To test the robustness of 
these patterns of dopamine release, a subset of outbred rats received 
extended training. During four additional sessions, the profound differ- 
ences in dopamine release between sign- and goal-trackers were stable 
(Supplementary Fig. 5), demonstrating that these differences are not 
limited to the initial stages of learning. The consistency of these dopa- 
mine patterns in selectively bred and outbred rats indicates that they are 
neurochemical signatures for sign- and goal-trackers rather than an 
artefact of selective breeding. 


Stimulus-reward learning under dopamine blockade 


Given the disparate patterns of dopamine signalling observed during 
learning a sign- versus goal-tracking CR, we tested whether the acquisi- 
tion and performance of these CRs were differentially dependent on 
dopamine transmission. Systemic administration of flupenthixol, a 
nonspecific dopamine receptor antagonist, attenuated performance of 
the CR for both bHR and bLR rats. This effect was clearly evident when 
the antagonist was administered during training (Fig. 4, sessions 1-7). It 
was also observed after the rats had already acquired their respective CR 
(Supplementary Fig. 6), but this latter finding needs to be interpreted 
cautiously because of a non-specific effect on activity (Supplementary 
Fig. 6e). More importantly, when examined off flupenthixol during the 
eighth test session, bHR rats still failed to demonstrate a sign-tracking 
CR (P=0.01 versus saline, session 8; Fig. 4a-c), indicating that 
dopamine is necessary for both the performance and the learning of a 
sign-tracking CR, consistent with previous findings’. In contrast, 
flupenthixol had no effect on learning the CS-US association that lead 
to a goal-tracking CR (P=0.6 versus saline, session 8; Fig. 4d-f), 
because on the drug-free session bLR rats showed a fully developed 
goal-tracking CR—their session 8 performance differed significantly 
from their session 1 performance (P = 0.0002). Further, they differed 
from the bLR saline group on session 1 (P = 0.0001), but did not differ 
from the bLR saline group on session 8. Thus, whereas dopamine may be 
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necessary for the performance of both sign-tracking and goal-tracking 
CRs, it is only necessary for acquisition ofa sign-tracking CR, indicating 
that these forms of learning are mediated by distinct neural systems. 

Collectively, these data provide several lines of evidence demon- 
strating that dopamine does not act as a universal teaching signal in 
stimulus-reward learning, but selectively participates in a form of 
stimulus—reward learning whereby Pavlovian incentive value is attri- 
buted to a CS. First, US-evoked dopamine release in the nucleus 
accumbens decreased during training in sign-trackers, but not in 
goal-trackers. Thus, during the acquisition of a goal-tracking CR, 
there is not a dopamine-mediated prediction-error teaching signal 
because, by definition, prediction errors become smaller as delivered 
rewards become better predicted. Second, the CS evoked dopamine 
release in both sign- and goal-tracking rats, but this signal increased to 
a greater degree in sign-trackers, which attributed incentive salience 
to the CS. These data indicate that the strength of the CS-US asso- 
ciation is reflected by dopamine release to the CS only in some forms 
of stimulus-reward learning. Third, bHR rats that underwent 
Pavlovian training in the presence of a dopamine receptor antagonist 
did not acquire a sign-tracking CR, consistent with previous reports’; 
however, dopamine antagonism had no effect on learning a goal- 
tracking CR in bLR rats. Thus, learning a goal-tracking CR does not 
require intact dopamine transmission, whereas learning a sign-tracking 
CR does. 

The attribution of incentive salience is the product of previous 
experience (that is, learned associations) interacting with an indivi- 
dual’s genetic propensity and neurobiological state*'””°-”’. The selec- 
tively bred rats used in this study have distinctive behavioural 
phenotypes, including greater behavioural disinhibition and reduced 
impulse control in bHR rats'®. Moreover, in these lines, unlike in 
outbred rats'*”®, there is a strong correlation between locomotor res- 
ponse to novelty and the tendency to sign-track'®. These behavioural 
phenotypes are accompanied by baseline differences in dopamine 
transmission, with bHR rats showing elevated sensitivity to dopamine 
agonists, increased proportion of striatal D2 receptors in a high-affinity 
state, greater frequency of spontaneous dopamine transients’®, and 
higher reward-related dopamine release before conditioning, all of 
which could enhance their attribution of incentive salience to reward 
cues””*°, However, basal differences in dopaminergic tone do not pro- 
vide the full explanation for differences in learning styles and asso- 
ciated dopamine responsiveness. Outbred rats with similar baseline 
locomotor activity't and similar baseline levels of reward-related 
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Figure 3 | Conditional responses and phasic dopamine signalling in 
response to CS and US presentation in outbred rats. Phasic dopamine release 
was recorded in the core of the nucleus accumbens using FSCV across six days 
of training. a, b, Behaviour directed towards the lever-CS (sign-tracking) 

(a) and behaviour directed towards the food-tray (goal-tracking) (b) during 
conditioning. Learning was evident in both groups because there was a 
significant effect of session both for rats that learned a sign-tracking response 
(n = 6; session effect on lever contacts: F(s,25) = 11.85, P = 0.0001) and for rats 
that learned a goal-tracking response (n = 5; session effect on food-receptacle 
contacts: F(s,29) = 3.09, P = 0.03). ¢, e, Change in dopamine concentration 
(mean + s.e.m.) in response to CS and US presentation for each session of 
conditioning. d, f, Change in peak amplitude (mean + s.e.m.) of the dopamine 
signal observed in response to CS and US presentation for each session of 
conditioning. (Bonferroni post-hoc comparison between CS- and US-evoked 
dopamine release: *P < 0.05; **P < 0.01). Panels c and d demonstrate that 
animals developing a sign-tracking CR (n = 6) show increasing phasic 
dopamine responses to CS presentation and decreasing responses to US 
presentation consistent with bHR rats. Panels e-f demonstrate that animals 
developing a goal-tracking CR (mn = 5) maintain phasic responses to US 
presentation consistent with bLR rats. 
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Figure 4 | Dopamine is necessary for learning CS-US associations that lead 
to sign-tracking, but not goal-tracking. a—c, The effects of flupenthixol on 
sign-tracking. a, Probability of approaching the lever-CS. b, Number of 
contacts with the lever-CS. c, Latency to contact the lever-CS. d-f, The effects of 
flupenthixol on goal-tracking. d, Probability of approaching the food-tray 
during lever-CS presentation. e, Number of contacts with the food-tray during 
lever-CS presentation. f, Latency to contact the food-tray during lever-CS 
presentation. Data are expressed as mean + s.e.m. Flupenthixol (sessions 1-7) 
blocked the performance of both sign-tracking and goal-tracking CRs. To 
determine whether flupenthixol influenced performance or learning of a CR, 
behaviour was examined following a saline injection on session 8 for all rats. 
bLR rats that were treated with flupenthixol before sessions 1-7 (n = 16) 
responded similarly to the bLR saline group (n = 10) on all measures of goal- 
tracking behaviour on session 8, whereas bHR rats treated with flupenthixol 
(n = 22) differed significantly from the bHR saline group (n = 10) on session 8 
(*P < 0.01, saline versus flupenthixol). Thus, bLR rats learned the CS-US 
association that produced a goal-tracking CR even though the drug prevented 
the expression of this behaviour during training. Parenthetically, bHR rats 
treated with flupenthixol did not develop a goal-tracking CR. 


dopamine release in the nucleus accumbens (see Fig. 3), differ in 
whether they are prone to learn a sign-tracking or goal-tracking CR, 
but they still develop patterns of dopamine release specific to that CR. 
Therefore, it appears that different mechanisms control basal dopa- 
mine neurotransmission versus the unique pattern of dopamine 
responsiveness to a reward cue. 

The neural mechanisms underlying sign- and goal-tracking beha- 
viour remain to be elucidated. Here we have shown that stimulus— 
reward associations that produce different CRs are mediated by 
different neural circuitry. Previous research using site-specific dopa- 
mine antagonism”! and dopamine-specific lesions” indicated that 
dopamine acts in the nucleus accumbens core to support the learning 
and performance of sign-tracking behaviour. This work demonstrates 
that dopamine-encoded prediction-error signals are indeed present in 
the nucleus accumbens of sign-trackers, but not in the nucleus accum- 
bens of goal-trackers. Although these neurochemical data alone do not 
rule out the possibility that prediction-error signals are present in other 
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dopamine terminal regions, the results from systemic dopamine 
antagonism demonstrate that intact dopamine transmission is 
generally not required for learning of a goal-tracking CR. 

We thus show that dopamine is an integral part of stimulus—reward 
learning that is specifically associated with the attribution of incentive 
salience to reward cues. Individuals who attribute reward cues with 
incentive salience find it more difficult to resist such cues, a feature 
associated with reduced impulse control'**'. Human motivated beha- 
viour is subject to a wide span of individual differences ranging from 
highly deliberative to highly impulsive actions directed towards the 
acquisition of rewards”. This work provides insight into the biological 
basis of these individual differences, and may provide an important 
step for understanding and treating impulse-control problems that 
are prevalent across several psychiatric disorders. 


METHODS SUMMARY 


The majority of these studies were conducted with adult male Sprague-Dawley rats 
from a selective-breeding colony which has been previously described’>. The data 
presented here were obtained from bHR and bLR rats from generations $18-S22. 
Equipment and procedures for Pavlovian conditioning have been described in 
detail elsewhere'*”*. Selectively bred rats from generations $18, S20 and $21 were 
transported from the University of Michigan to the University of Washington for 
the FSCV experiments. During each behaviour session, chronically implanted 
microsensors, placed in the core of the nucleus accumbens, were connected to a 
head-mounted voltammetric amplifier for detection of dopamine by FSCV*. 
Voltammetric scans were repeated every 100ms to obtain a sampling rate of 
10 Hz. Voltammetric analysis was carried out using software written in LabVIEW 
(National Instruments). On completion of the FSCV experiments, recording sites 
were verified using standard histological procedures. To examine the effects of 
flupenthixol (Sigma; dissolved in 0.9% NaCl) on the performance of sign-tracking 
and goal-tracking behaviour, rats received an injection (intraperitoneal, ip.) of 150, 
300 or 600 'gkg ' of the drug one hour before Pavlovian conditioning sessions 9, 
11 and 13. Doses of the drug were counterbalanced between groups and interspersed 
with saline injections (i-p., 0.9% NaCl; before sessions 8, 10, 12 and 14) to prevent 
any cumulative drug effects. To examine the effects of flupenthixol on the acquisi- 
tion of sign-tracking and goal-tracking behaviour, rats received an injection (i.p.) of 
either saline or 225 gkg ' of the drug one hour before Pavlovian conditioning 
sessions 1-7. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 


Received 3 May; accepted 15 October 2010. 
Published online 8 December 2010. 


1. Schultz, W. Behavioral theories and the neurophysiology of reward. Annu. Rev. 
Psychol. 57, 87-115 (2006). 

2. Wise, R. A. Dopamine, learning and motivation. Nature Rev. Neurosci. 5, 483-494 
(2004). 

3. Day, J. J., Roitman, M. F., Wightman, R. M. & Carelli, R. M. Associative learning 
mediates dynamic shifts in dopamine signaling in the nucleus accumbens. Nature 
Neurosci. 10, 1020-1028 (2007). 

4. Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and 
reward. Science 275, 1593-1599 (1997). 

5. Montague, P. R., Dayan, P. & Sejnowski, T. J. A framework for mesencephalic 
dopamine systems based on predictive Hebbian learning. J. Neurosci. 16, 
1936-1947 (1996). 

6. Waaelti, P., Dickinson, A. & Schultz, W. Dopamine responses comply with basic 
assumptions of formal learning theory. Nature 412, 43-48 (2001). 

7. Balleine, B. W., Daw, N. D. & O’Doherty, J. P. in Neuroeconomics: Decision Making 

and the Brain (eds Glimcher, P. W., Camerer, C. F., Fehr, E. & Poldrack, R. A.) 

367-389 (Academic Press, 2008). 

8. Berridge, K. C. The debate over dopamine’s role in reward: the case for incentive 

salience. Psychopharmacology 191, 391-431 (2007). 

9. Berridge, K. C. & Robinson, T. E. What is the role of dopamine in reward: hedonic 

impact, reward learning, or incentive salience? Brain Res. Brain Res. Rev. 28, 

309-369 (1998). 

10. Berridge, K. C., Robinson, T. E. & Aldridge, J. W. Dissecting components of reward: 

‘liking’, wanting’, and learning. Curr. Opin. Pharmacol. 9, 65-73 (2009). 

11. Panksepp, J. Affective consciousness: core emotional feelings in animals and 

humans. Conscious. Cogn. 14, 30-80 (2005). 


ARTICLE 


12. Hearst, E.& Jenkins, H. Sign-Tracking: The Stimulus-Reinforcer Relation and Directed 
Action (Monograph of the Psychonomic Society, 1974). 

13. Boakes, R. in Operant-Pavlovian Interactions (eds Davis, H. & Hurwitz, H. M. B.) 
67-97 (Erlbaum, 1977). 

14. Robinson, T.E. & Flagel, S. B. Dissociating the predictive and incentive motivational 
properties of reward-related cues through the study of individual differences. Biol. 
Psychiat. 65, 869-873 (2009). 

15. Stead, J. D. et al. Selective breeding for divergence in novelty-seeking traits: 
heritability and enrichment in spontaneous anxiety-related behaviors. Behav. 
Genet. 36, 697-712 (2006). 

16. Flagel, S. B. et a/. An animal model of genetic vulnerability to behavioral 
disinhibition and responsiveness to reward-related cues: implications for 
addiction. Neuropsychopharmacology 35, 388-400 (2010). 

17. Berridge, K. C. in Psychology of Learning and Motivation (ed. Medin, D. L.) 223-278 
(Academic Press, 2001). 

18. Cardinal, R. N., Parkinson, J. A., Hall, J. & Everitt, B. J. Emotion and motivation: the 
role of the amygdala, ventral striatum, and prefrontal cortex. Neurosci. Biobehav. 
Rev. 26, 321-352 (2002). 

19. Kelley, A. E. Functional specificity of ventral striatal compartments in appetitive 
behaviors. Ann. NY Acad. Sci. 877, 71-90 (1999). 

20. Dalley, J. W. et al. Time-limited modulation of appetitive Pavlovian memory by D1 
and NMDA receptors in the nucleus accumbens. Proc. Nat! Acad. Sci. USA 102, 
6189-6194 (2005). 

21. Di Ciano, P., Cardinal, R. N., Cowell, R.A. Little, S. J. & Everitt, B. J. Differential 
involvement of NMDA, AMPA/kainate, and dopamine receptors in the nucleus 
accumbens core in the acquisition and performance of paviovian approach 
behavior. J. Neurosci. 21, 9471-9477 (2001). 

22. Parkinson, J. A. et al. Nucleus accumbens dopamine depletion impairs both 
acquisition and performance of appetitive Pavlovian approach behaviour: 
implications for mesoaccumbens dopamine function. Behav. Brain Res. 137, 
149-163 (2002). 

23. Parkinson, J. A., Olmstead, M. C., Burns, L. H., Robbins, T. W. & Everitt, B. J. 
Dissociation in effects of lesions of the nucleus accumbens core and shell on 
appetitive pavlovian approach behavior and the potentiation of conditioned 
reinforcement and locomotor activity by D-amphetamine. J. Neurosci. 19, 
2401-2411 (1999). 

24. Clark, J. J. et al. Chronic microsensors for longitudinal, subsecond dopamine 
detection in behaving animals. Nature Methods 7, 126-129 (2010). 

25. Robinson, T. E. & Berridge, K. C. The neural basis of drug craving: an incentive- 
sensitization theory of addiction. Brain Res. Brain Res. Rev. 18, 247-291 (1993). 

26. Tindell, A. J., Smith, K. S., Berridge, K. C. & Aldridge, J. W. Dynamic computation of 
incentive salience: "wanting" what was never "liked". J. Neurosci. 29, 
12220-12228 (2009). 

27. Zhang, J., Berridge, K. C., Tindell, A. J., Smith, K. S. & Aldridge, J. W. A neural 
computational model of incentive salience. PLOS Comput. Biol. 5, e1000437 
(2009). 

28. Beckmann, J.S., Marusich, J. A., Gipson, C. D. & Bardo, M. T. Novelty seeking, 
incentive salience and acquisition of cocaine self-administration in the rat. Behav. 
Brain Res. 216, 159-165 (2011). 

29. Wyvell, C. L. & Berridge, K. C. Intra-accumbens amphetamine increases the 
conditioned incentive salience of sucrose reward: enhancement of reward 
"wanting" without enhanced "liking" or response reinforcement. J. Neurosci. 20, 
8122-8130 (2000). 

30. Wyvell, C. L. & Berridge, K. C. Incentive sensitization by previous amphetamine 
exposure: increased cue-triggered "wanting" for sucrose reward. J. Neurosci. 21, 
7831-7840 (2001). 

31. Tomie,A., Aguado, A.S., Pohorecky, L.A. & Benjamin, D. Ethanol induces impulsive- 
like responding in a delay-of-reward operant choice procedure: impulsivity 
predicts autoshaping. Psychopharmacology 139, 376-382 (1998). 

32. Kuo, W. J., Sjostrom, T., Chen, Y. P., Wang, Y. H. & Huang, C. Y. Intuition and 
deliberation: two systems for strategizing in the brain. Science 324, 519-522 
(2009). 


Supplementary Information. is linked to the online version of the paper at 
www.nature.com/nature. 


Acknowledgements This work was supported by National Institutes of Health grants: 
RO1-MHO079292 (to P.E.M.P.), RO1-DA027858 (to P.E.M.P.), T32-DA07278 (to J.J.C.), 
F32-DA24540 (to J.J.C.), R37-DA04294 ( to T.E.R.), and 5P01-DA021633-02 (to T.E.R. 
and H.A.). The selective breeding colony was supported by a grant from the Office of 

Naval Research to H.A. (NO0014-02-1-0879). We thank K. Berridge and J. Morrow for 
comments on earlier versions of the manuscript, and S. Ng-Evans for technical support. 


Author Contributions S.B.F, JJ.C., T.E.R., P.E.M.P. and H.A. designed the experiments 
and wrote the manuscript. S.B.F., JJ.C., L.M., A.C., |W. and C.A.A. conducted the 
experiments, S.M.C. oversaw the selective breeding colony, and S.B.F. and JJ.C. 
analysed the data. 


Author Information Reprints and permissions information is available at 
www.nature.com/reprints. The authors declare no competing financial interests. 
Readers are welcome to comment on the online version of this article at 
www.nature.com/nature. Correspondence and requests for materials should be 
addressed to P.E.M.P. (pemp@uw.edu) or H.A. (akii@umich.edu). 


00 MONTH 2010 | VOL 000 | NATURE | 5 


©2010 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


METHODS 


Animals. Adult male Sprague-Dawley rats selectively bred for reactivity to a novel 
environment were used for the majority of these studies'*. The data presented 
here were obtained from bHR and bLR rats from generations $18 to $22. The 
experiments followed the Guidelines for the Care and Use of Mammals in 
Neuroscience and Behavioural Research (National Research Council 2003) and 
the procedures were approved by the University Committee on the Use and Care 
of Animals. Unless otherwise indicated, rats were housed in pairs and kept on a 
12-h light/12-h dark cycle (lights on 06:00 h) with controlled temperature and 
humidity and food and water were available ad libitum. 

Voltammetry studies were conducted at the University of Washington using 
bHR and bLR rats from generations $18, S20 and S21as well as male Sprague- 
Dawley rats obtained from Charles River weighing between 300 g and 350 g upon 
arrival. These rats were housed individually and kept on a 12-h light/12-h dark 
cycle (lights on at 0700) with controlled temperature and humidity. Prior to 
behavioural training, food was restricted so that rats maintained 90% of their 
free-feeding body weight and water was available ad libitum. All animal proce- 
dures followed the University of Washington Institutional Animal Care and Use 
Committee guidelines. 

Screening for selectively bred phenotypes. To confirm the selectively bred phe- 
notypes, each generation of rats were screened for locomotor activity in novel test 
chambers at around 60 days of age, as previously described'*”’. 

Pavlovian conditioning procedures. Equipment and procedures for Pavlovian 
conditioning have been described in detail elsewhere’*"*. Briefly, standard Med 
Associates test chambers were equipped with a food-tray located in the middle of 
the front wall and a retractable lever located to the left or right of the food-tray 
(counterbalanced). The lever required only a 10-g force to operate, such that most 
contacts with the lever were detected and recorded as a ‘lever press’. Operation of 
the pellet dispenser (Med Associates) delivered one 45-mg banana-flavoured food 
pellet (Bio-Serv) into the food-tray. Head entries into the food-tray were recorded 
each time the rat broke a photobeam located inside the receptacle. 

All Pavlovian training sessions were conducted between 13:00h and 18:00 h. 
Banana-flavoured food pellets were placed into the rats’ home cages for 2 days 
before training to familiarize the animals with this food (the unconditioned 
stimulus, US). Two pre-training sessions were conducted that consisted of the 
delivery of 50 food pellets, which were randomly delivered on a variable-interval 
30-s schedule (25-min session), during which it was determined whether the rats 
were reliably retrieving the food pellets. Following pre-training sessions, 
Pavlovian training sessions consisted of the presentation of the illuminated lever 
(conditioned stimulus, CS) in the chamber for 8 s, and then immediately upon its 
retraction a 45-mg food pellet (US) was delivered into the food-tray (the ‘goal’). 
The CS was presented on a random-interval 90-s schedule and each Pavlovian 
training session consisted of 25 trials (or CS-US pairings). Training continued for 
6-12 sessions. Rats in the ‘random’ groups received presentations of the CS and 
US, each on a variable-interval 90-s schedule. 

The following events were recorded using Med Associates software: (1) the 
number of lever-CS contacts, (2) the latency to the first lever-CS contact, (3) the 
number of food-tray entries during lever-CS presentation, and (4) the latency to 
the first food-tray entry during lever-CS presentation. It is important to note that no 
response is required for the rat to receive the reward (US), yet distinct CRs emerge as 
a result of Pavlovian conditioning. The outcome measures listed above allow us to 
examine CS-directed (sign-tracking) versus goal-directed (goal-tracking) res- 
ponses. Using these measures we calculated the probability that a rat would 
approach the lever-CS or the food-tray as well as the difference in its probability 
of approaching the lever-CS versus the food-tray. 

Statistical analysis of Pavlovian conditional responses. Differences in the con- 
ditional response that emerged across training sessions were analysed using linear 
mixed effects models (SPSS 17.0; see also ref. 34), in which phenotype and session 
were treated as independent variables. In addition, the effect of session for each 
phenotype was analysed separately. For all analyses, the covariance structure was 
explored and modelled appropriately. When significant main effects or interac- 
tions were detected, Bonferroni post-hoc comparisons were made. The differ- 
ences in the probability of approaching the lever-CS versus the food-tray (Fig. le) 
were further examined using one-sample t-tests (with hypothesized value of 0) to 
determine whether either phenotype exhibited a preference for the lever-CS or the 
food-tray. 

Conditioned reinforcement test. The conditioned reinforcement test occurred 
one day after the last of 12 Pavlovian training sessions. The conditioned re- 
inforcement test was conducted in the same standard Med Associates chambers 
as described above. However, for the purposes of this test the chambers were 
rearranged such that the retractable lever was placed in the centre of the front wall 
in between two nosepoke ports. The ‘active’ port was placed on the side of the wall 
opposite to the location of the lever-CS during Pavlovian training. During the 


40-min conditioned reinforcement test nosepokes into the port designated 
‘active’ resulted in the 2-s presentation of the illuminated lever, whereas pokes 
into the other ‘inactive’ port were without consequence. The number of nose- 
pokes into the active and inactive ports and the number of contacts with the lever 
were recorded throughout the test session. 

Statistical analysis of conditioned reinforcement. Performance on the condi- 
tioned reinforcement test was analysed using a three-way analysis of variance 
(ANOVA) in which phenotype, group (paired versus unpaired) and port (active 
versus inactive) were treated as independent variables and the number of pokes as 
the dependent variable. Further analyses were then conducted to determine the effect 
of group or port for each phenotype and the effect of phenotype or group for each port. 
FSCV. The following procedures were in accordance with the University of 
Washington Institutional Animal Care and Use Committee guidelines. Surgical pre- 
paration for in vivo voltammetry used an aseptic technique. Rats were anaesthetized 
with isofluorane and placed in a stereotaxic frame. The scalp was swabbed with 10% 
povidone iodine, bathed with a mixture of lidocaine (0.5 mg kg ') and bupivicaine 
(0.5 mg kg’ *), and incised to expose the cranium. Holes were drilled and cleared of 
dura mater above the nucleus accumbens core (1.3-mm lateral and 1.3-mm rostral 
from the bregma), and at convenient locations for a reference electrode and three 
anchor screws. The reference electrode and anchor screws were positioned and 
secured with cranioplastic cement, leaving the working electrode holes exposed. 
Once the cement cured, the microsensors were attached to the voltammetric amplifier 
and lowered into the target recording regions (the core of the nucleus accumbens, 7.0- 
mm ventral of dura mater). Finally, cranioplastic cement was applied to the part of the 
cranium still exposed to secure the working electrode. 

Voltammetric measurement. During all experimental sessions, chronically 
implanted microsensors were connected to a head-mounted voltammetric amplifer 
for dopamine detection by FSCV™*. Voltammetric scans were repeated every 100 ms 
to obtain a sampling rate of 10 Hz. When dopamine is present at the surface of the 
electrode during a voltammetric scan, it is oxidized during the anodic sweep to form 
dopamine-o-quinone (peak reaction at approximately +0.7 V), which is reduced 
back to dopamine in the cathodic sweep (peak reaction at approximately —0.3 V). 
The ensuing flux of electrons is measured as current and is directly proportional to 
the number of molecules that undergo the electrolysis. The redox current obtained 
from each scan provides a chemical signature that is characteristic of the analyte, 
allowing resolution of dopamine from other substances. For quantification of 
changes in dopamine concentration over time, the current at its peak oxidation 
potential can be plotted for successive voltammetric scans. Waveform generation, 
data acquisition and analysis were carried out on a PC-based system using two PCI 
multifunction data acquisition cards and software written in LabVIEW (National 
Instruments). 

Statistical analysis of voltammetry data. Voltammetric data analysis was carried 
out using software written in LabVIEW (National Instruments) and low-pass 
filtered at 2,000 Hz. Dopamine was isolated from the voltammetric signal with 
chemometric analysis* using a standard training set based upon stimulated 
dopamine release detected by chronically implanted electrodes. Dopamine con- 
centration was estimated on the basis of the average post-implantation sensitivity 
of electrodes”. Before the generation of surface plots and analysis of peak values, 
all data were smoothed with a 5-point within-trial running average. Peak dopa- 
mine values in response to the US and CS were obtained by taking the largest 
value in the 3-s period after stimulus presentation. Peak values were then com- 
pared using mixed models ANOVA with training session as the repeated measure 
and stimulus (CS and US) or phenotype (bHR and bLR) as the between-group 
measure. Peak CS-evoked dopamine signalling was also analysed across trials 
using linear regression. The slopes obtained for the regression were compared 
between groups using independent, two-sample t-tests. All post-hoc comparisons 
were made with the Bonferroni correction for multiple tests. All statistical ana- 
lyses were carried out using Prism (GraphPad Software). Voltammetric data for 
dopamine responses to the CS and US were also analysed using an area-under- 
the-curve approach. This approach did not alter the statistical effects of any 
comparison reported in the paper for peak dopamine value (specific statistical 
results not shown). 

Histological verification of recording site. On completion of experimentation, 
animals were anesthetized with intraperitoneal ketamine (100 mg kg ') and xyla- 
zine (20 mgkg ') and then transcardially perfused with saline followed by 4% 
paraformaldehyde. Brains were removed and post-fixed in paraformaldehyde for 
24h and then rapidly frozen in an isopentane bath (~5 min), sliced on a cryostat 
(50-11m coronal sections, 20 °C) and stained with cresyl violet to aid in visualiza- 
tion of anatomical structures. 

Effects of flupenthixol on sign-tracking and goal-tracking performance. The 
effects of flupenthixol (a D1/D2 antagonist; Sigma) on the performance of 
sign-tracking and goal-tracking behaviour were examined after seven sessions 
of Pavlovian conditioning. All rats received an injection (i.p.) of 150, 300 or 
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600 pg kg * of the drug one hour before Pavlovian conditioning sessions 9, 11 
and 13. Doses of the drug (dissolved in 0.9% NaCl) were counterbalanced 
between groups and interspersed with saline injections (i-p., 0.9% NaCl; before 
sessions 8, 10, 12 and 14) to prevent any cumulative drug effects. The following 
measures were recorded to examine the effects of the drug on the CR: (1) the 
number of lever-CS contacts, (2) the latency to the first lever-CS contact, (3) the 
number of food-tray entries during lever-CS presentation, and (4) the latency to 
the first food-tray entry during lever-CS presentation. In addition, a nosepoke 
port was added to the test chamber on the wall opposite the retractable lever and 
responses into this port were recorded as an index of nonspecific activity. For all 
measures the response to saline was averaged (across sessions 8, 10, 12 and 14) 
and compared to the response following each of the three doses of flupenthixol. 
Statistical analysis of effects of flupenthixol on performance of the CRs. The 
effects of flupenthixol on the performance of sign-tracking and goal-tracking beha- 
viour (Supplementary Fig. 6) were analysed using linear mixed effects models with 
phenotype and dose treated as independent variables. Each phenotype was also 
analysed separately to determine the effect of dose on a given behaviour and 
Bonferroni post-hoc comparisons were made to determine whether behaviour at 
a given dose was significantly different from that in response to saline. 

Effects of flupenthixol on the learning of sign-tracking and goal-tracking. The 
effects of flupenthixol on the acquisition of sign-tracking and goal-tracking CRs 
were examined using two generations of bred rats (S21 and S22). Rats received an 
injection of either saline (i-p.; 0.9% NaCl) or 225 pg kg” ' of flupenthixol one hour 
before Pavlovian conditioning sessions 1-7. This dose of drug was chosen based 
on the ‘performance’ study described above, because we wanted to avoid any 
nonspecific inhibitory effects on motor activity. Rats from both generations that 
received flupenthixol before sessions 1-7 then received an injection of saline 
before session 8. However, only rats from the $22 generation that received saline 
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before sessions 1-7 also received saline before session 8. Thus, the number of rats 
that received saline during training and were also pretreated with saline before 
session 8 is lower than that for the other groups (that is, on session 8, bHR saline, 
n= 10; bLR saline, n = 10). The following measures were recorded and analysed 
to examine the effects of flupenthixol on sign-tracking and goal-tracking beha- 
viour: (1) the number of lever-CS contacts, (2) the latency to the first lever-CS 
contact, (3) the number of food-tray entries during lever-CS presentation, and (4) 
the latency to the first food-tray entry during lever-CS presentation. 

Statistical analysis of effects of flupenthixol on the learning of the CRs. Linear 
mixed effects models were used to examine the effects of flupenthixol on the per- 
formance and learning of sign-tracking or goal-tracking behaviour (Supplementary 
Fig. 6). For these analyses each phenotype was analysed separately to determine the 
effect of dose on a given behaviour and treatment (saline versus flupenthixol) and 
session (1-7) were treated as independent variables. To determine whether 
flupenthixol prevented the expression of the conditioned response or the learning 
of a conditioned response we also examined behaviour following a saline injection on 
session 8 (drug-free test session). Behaviour on session 8 was compared between 
treatment groups using an unpaired t-test for each phenotype separately. We also 
compared the response on session 8 of the groups that received flupenthixol during 
training to that of the group that received flupenthixol on session 1 (using a paired 
t-test) and to that of the saline control group on session 1 (using an unpaired t-test). 
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Experimental niche evolution alters the strength of 
the diversity-productivity relationship 


Dominique Gravel!?, Thomas Bell’, Claire Barbera’, Thierry Bouvier*, Thomas Pommier*”, Patrick Venail?+ & Nicolas Mouquet? 


The relationship between biodiversity and ecosystem functioning 
(BEF) has become a cornerstone of community and ecosystem eco- 
logy’? and an essential criterion for making decisions in conser- 
vation biology and policy planning*”. It has recently been proposed 
that evolutionary history should influence the BEF relationship 
because it determines species traits and, thus, species’ ability to 
exploit resources’. Here we test this hypothesis by combining 
experimental evolution with a BEF experiment. We isolated 20 
bacterial strains from a marine environment and evolved each to 
be generalists or specialists*. We then tested the effect of evolutionary 
history on the strength of the BEF relationship with assemblages 
of 1 to 20 species constructed from the specialists, generalists and 
ancestors’. Assemblages of generalists were more productive on 
average because of their superior ability to exploit the environmental 
heterogeneity’®. The slope of the BEF relationship was, however, 
stronger for the specialist assemblages because of enhanced niche 
complementarity. These results show how the BEF relationship 
depends critically on the legacy of past evolutionary events. 

Two fundamental ecological mechanisms can generate positive BEF 
relationships'''’. First, species may occupy complementary ecological 
niches, for example by feeding on different resources. In communities 
of complementary species, more of the total available niche space is 
filled in diverse communities, resulting in better community-wide 
resource use. Second, high-functioning and competitively dominant 
species are more likely to be found within species-rich communities 
(the sampling effect). Both mechanisms require a detailed understand- 
ing of species’ phenotypic traits'*'*. There has been, however, virtually 
no effort to understand how the evolution of species traits within eco- 
logical communities affects ecosystem functioning®””». 

One important trait that determines complementarity is the degree 
of resource specialization, that is, the number of resources a species is 
able to exploit. Species niche width will tend to evolve to match the 
amount of available environmental variation’*’’”. In simple environ- 
ments specialized types are expected to evolve, whereas generalists are 
more likely to appear in environments containing many resources'*"”. 
The degree of specialization could alter the BEF relationship"®, so a full 
understanding the relationship must account for the evolutionary 
forces driving trait diversity. 

In communities containing only specialist species that feed on dif- 
ferent resources, the species do not compete with each other and their 
effects on ecosystem functioning (here productivity) are therefore 
additive (that is, the BEF relationship is linear; Fig. 1). However, the 
increased ability to exploit one resource might come with a lower ability 
to exploit any other, that is, there is a trade-off between resource usage 
ability'® (Fig. 1a). For any type of trade-off, evolution towards generali- 
zation will affect the BEF relationship (Fig. 1b and Supplementary 
Information, section 1). First, ecosystem functioning at low diversity 
should be lower for specialists because they are inefficient at exploiting 
environmental heterogeneity. Second, generalization should reduce the 


contribution of additional species to ecosystem functioning and, thus, 
the slope (that is, strength) of the BEF relationship. Generalization also 
increases niche overlap and thus produces a nonlinear, saturating BEF 
relationship. We therefore predicted that the ecosystem functioning 
would be higher for generalists at low diversity and that the slope of 
the BEF relationship would be reduced in communities of generalists. 

We tested these predictions by experimentally evolving niche speciali- 
zation and conducting BEF experiments. Briefly, we promoted the evolu- 
tion of generalist and specialist strategies from 20 ancestral bacterial 
strains that had been isolated from a marine environment. Each strain 
was grown either on a single resource (one different carbon substrate for 
each strain) or on a mixture of 31 resources (with a total resource avail- 
ability equal to that in the single-resource treatment). Bacteria were 
serially transferred to fresh medium every 48 h for 32 transfers, allowing 
evolutionary adaptations” (Supplementary Information, section 2). 
Bacteria grown on the mixed medium (hereafter called generalists) 
tended to have higher performance on a wide array of substrates in 
comparison with the bacteria evolved on the simple medium consisting 
of a single resource (hereafter called specialists; see below). 

We conducted the BEF experiment on the mixed medium of 31 
resources for both evolutionary schemes (generalist and specialist) 
and with the ancestral strains (hereafter referred as species). We used 
the productivity (bacterial metabolic activity) after 48 and 72h as our 
measure of ecosystem functioning’. The selection period resulted in a 
substantially increased productivity at all levels of species richness for 
both specialists and generalist assemblages (Fig. 2). Productivity sig- 
nificantly increased with the logarithm of species richness (48h, 
Fy 1,506 = 291.2, P< 0.001; 72 h, Fi, 1,506 = 179.2, P< 0.001), indi- 
cating that productivity was a saturating function of species richness. 
Productivity in monocultures differed significantly among treatments 
(48h, Fo, 1596 = 1,751.9, P< 0.001; 72h, Fy, 1.596 = 2,309.3, P< 0.001), 
with ancestors performing the worst, followed in order by specialists 
and generalists. The slope of the BEF relationship also differed sig- 
nificantly among treatments (48h, F5, 1506 = 16.2, P<0.001; 72h, 
F; 1,506 = 8.8, P<0.001), being steeper for specialists. A model 
accounting for species composition also showed that the BEF relation- 
ship of ancestors was best described by a linear function of species 
richness and a nonlinear (saturating) function for the two evolutionary 
treatments (Supplementary Information, section 3). There was no 
relationship between the contribution of ancestors to the BEF and 
the contribution of their evolved counterparts (Supplementary Infor- 
mation, section 4). The difference between the slopes of the specialist 
and generalist BEF relationships was even stronger after 72 h, a result 
similar to those of experiments conducted with plant*’* and marine” 
communities. 

We investigated whether the difference in the strength of the BEF 
relationship resulted from specialization, by growing each of the ancestral 
strains and specialist and generalist lineages on the 31 individual carbon 
substrates to estimate their final niche width. We recorded the number 
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Figure 1 | Theoretical predictions of the effect of niche specialization on the 
strength of the BEF relationship. a, Hypothetical relationship between the 
consumption rates ofa generalist on two resources. A perfect trade-off follows a 
straight line and any deviation to this reflects a relative cost (strong trade-off) or 
benefit (weak trade-off) of generalization. b, The BEF relationship will be 
affected by generalization and the type of trade-off. The figure represents a 
simulation experiment with N = 20 resources (R;) and up to as many (S = N) 
consumer species (Cj). The dynamics of this system is given by the simple 
chemostat model dR;/dt = eR —eR;— eae at,gRiCj, dC;/dt = 

Yen aRiCj — mC; — eC), where e is the dilution rate, Ro is the resource 
concentration in the inflow (for simplicity, we suppose that all resources have 
the same concentration in the inflow), «is the per capita consumption rate of 
the resource i by consumer j, and m is the mortality rate. The ecosystem 
productivity at equilibrium is equivalent to the total resource consumption and 
is given by B= )> a o,R;C;. Our simulation parameters are Ro = 1, e = 0.1 
and m = 0.1. We specified for the specialist that the consumption rate for its 
preferential resource is = 1 and «;,4; = 0 for the alternative resources. For 
the generalists, we specified the performance on the preferential resource to be 
2%; = 0.6 and an equal partitioning of the consumption rates between alternative 
resources that sums to 0.4 for the linear trade-off, 0.1 for the strong trade-off 
and 0.55 for the weak trade-off. We simulated communities of 1 to 20 species 
for specialists and generalists with linear, weak and strong trade-offs. Further 
details of the model and analytical results are given in Supplementary 
Information. 


of substrates each lineage was able to exploit. Bacteria cultured on 
single substrates adapted to fewer substrates than bacteria cultured 
on mixed medium (Table 1 and Supplementary Information, section 
5). The generalists were able on average to exploit a larger number of 
substrates (10.75 + 1.49 (s.e.) of the 31 substrates) than the specialists 
(4.80 + 0.51 (s.e.); t-test for paired samples, ty) = 3.45, P = 0.002). The 
average number of shared substrates between all pairs of strains and 
lineages was also much higher for generalists than for ancestors and 
specialists (Table 1). This ability to exploit more substrates is reflected 
in the performance of the bacteria when grown on the mixed medium. 
The maximal performance recorded for each substrate for generalists 
was also significantly higher than the maximal performance for spe- 
cialists (t-test for paired samples, t39 = 2.95, P = 0.006; Table 1), which 
suggests there was no trade-off in resource usage ability. It has been 
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Figure 2 | Evolutionary treatments affect the strength of the biodiversity- 
productivity relationships. Productivity is measured as the absorbance at 
590 nm after 48 h (corresponding to the conditions experienced during the 
selection experiment; a) and 72h (b). Data show mean = s.e. (n = 504 per 
evolutionary treatment). Lines depict the results of the analysis of covariance 
model. 


shown that concurrent adaptation to multiple resources does not 
always limit the capability to exploit each resource individually*. In 
fact, mutations increasing fitness on a given resource can even some- 
times increase fitness on other resources (that is, synclinal selection”), 
preventing the occurrence of trade-offs. We also checked for within- 
lineage genotypic variability and found it to be low (Supplementary 
Information, section 6). Most generalist lineages were composed of 
generalist genotypes, but some consisted of mixtures of coexisting 
specialist genotypes. However, the qualitative results and conclusions 
(Fig. 2) were unaffected when the analysis was restricted to only those 
lineages that were composed of the most generalist genotypes (Sup- 
plementary Information, section 6). 

Some lineages that evolved on the mixed medium responded weakly 
to the experimental evolution and remained specialists, and some 
lineages from the specialist treatment evolved towards generalization 
(Supplementary Information, section 5). Consequently, there is vari- 
ance in ecosystem functioning that should be better predicted by func- 
tional diversity’. In addition to enhanced complementarity, better 
performance of the generalists on each carbon source might also have 
contributed to the difference in the BEF relationships between evolu- 
tionary treatments (Fig. 2). We calculated a niche diversity index 
(NDI) for every assemblage of the specialist and generalist treatments. 
The NDI is the number of different substrates a community is able to 
exploit, calculated on the basis of assays of individual species” 
(Supplementary Information, section 7). Given that generalists were 
more productive on each substrate, we expected that for a given NDI 
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Ancestral strains 


Specialist lineages 


Generalist lineages 


No. of substrates used 

No. of substrates used by the 20 species assemblages 
Average no. of shared substrates between strain-lineage pairs 
Average maximal productivity for each substrate 

Productivity of monocultures on the mixed medium 


4.00 + 0.69 480+0.51 10.75 + 1.49 
16 14 29 
2.25 + 0.08 2.74£0.11 5.80 + 0.24 
0.21 +0.07 0.23 + 0.08 0.42 +0.10 
0.01 +0.01 0.12+0.01 0.19 + 0.02 


Niche specialization was assessed from performance assays on the 31 carbon sources. A strain (or lineage) was considered able to exploit a substrate when its absorbance at 590 nm after 48 h was larger than the 


95% distribution of the blanks. All data, mean + s.e. 


productivity would be larger for the generalist assemblages. We found 
that productivity significantly increases with community NDI 
(Fi, 667 = 84.0, P< 0.001; Fig. 3) (we note that the analysis is conducted 
on the NDI range from 3 to 14 to meet the requirements of analysis of 
covariance) and the logarithm of species richness (F1,667 = 5.23, 
P= 0.022). The intercepts and the slope of the NDI-productivity rela- 
tionship differ between evolutionary treatments (48h, Fi, 667 = 72.6, 
P<0.001; 72h, Fy 667 = 19.4, P< 0.001). Overall, most of the variance 
is accounted for by the NDI and the effect of the evolutionary treatments 
on the intercept. The larger amount of explained variance by the NDI 
argues for complementarity as the dominant mechanism. Our experi- 
mental evolutionary treatments therefore affected both species comple- 
mentarity and maximal productivity at equivalent complementarity. 
The ancestral strains had not previously encountered the experi- 
mental conditions, so it is unsurprising that the intercept of the BEF 
relationship was greatly reduced as a result of their maladaptation 
(Fig. 2). Nonetheless, there is still a significant positive BEF relation- 
ship. Because each species was equally represented in the experiment, 
it is possible to estimate the degree to which they were associated with 
higher- or lower-than-average levels of functioning’®. We found that 
the inferred species contributions were dominated by a single ancestral 
strain, whereas they were distributed more equitably in the specialist 
and the generalist treatments (Supplementary Information, sections 3 
and 4). We note there is no significant correlation between the ances- 
tors’ contribution to the BEF and the contribution of their evolved 
counterparts. The data therefore provide evidence that evolutionary 
history could affect both mechanisms of the BEF relationship. First, the 
BEF relationship will be stronger for communities of specialists 
because of enhanced complementarity. Second, if most species are 
maladapted, few species are able to contribute to functioning and 


e , 8 
8 e 
e : 8 
0.3 . °s ee : } 
e e 8 
: 3ie3 “oo ds b-4 o, 8 
[.: sens 8 8 
° Rak odes OEE4 
> Heer ba 28 bee Sou Blige 
5 024 Ppepidessagiges 38° | 
ae) g° 4: ee8 3° > H e 
3 aj j 8 @ 0 ee %e 
& 8 8 & a e ee ; j. 
£88 6 : © 
egee 
014 88s ‘ 
A 8° e @ e 
8528 
8 o*eee8e 
e 8 @ 
bs ° ( § tx ® Specialists 
6 
fal wee 8 8 Pree ‘ @ Generalists 
T T T 
5 10 15 20 25 30 


NDI 


Figure 3 | The relationship between NDI and ecosystem functioning. NDI is 
the total number of carbon substrates a community is able to exploit, assessed 
from the individual ability of each lineage to exploit the carbon substrates and 
the community composition. 


sampling effects dominate. Such a mechanism might be particularly 
important for ecosystem functioning in variable environments, where 
species are far from their optimal fitness peaks”’. 

In this study, we have deliberately evolved independent lineages of 
specialists and generalists to compare assemblages of species that come 
from the same ancestral strain but have different evolutionary histories. 
It is likely that in nature species will diversify in complex assemblages of 
specialists and generalists depending on the environmental context”. 
We note that we cannot exclude the possibility that the generalists were 
an ensemble of specialist genotypes. Previous work has, however, 
shown that selection in a heterogeneous environment is most likely 
to result in the evolution of generalists*. In any case, the genotypic 
variability in a population, leading to increased generality, is still a 
species’ trait that will influence the BEF relationship. Some of the 
changes observed between the evolved lineages might also have come 
through physiological adaptation. However, after at least a hundred 
generations we have found highly contrasting metabolic profiles 
(Table 1), no correlation between species contributions to the BEF 
relationship (Supplementary Information, section 3) and a strong res- 
ponse to selection (Supplementary Information, section 2). All of these 
observations are consistent with evolutionary changes. 

A variety of BEF relationships have been observed and different 
ecological mechanisms have been inferred’*””*°. Our results provide 
strong support for the role of complementarity and evolutionary history 
in BEF. We found that specialists contribute more to the BEF. 
Monocultures of generalists were also found among the most produc- 
tive assemblages. For conservation decisions, these results emphasize 
that, on average, the loss of specialists will have stronger effects on 
ecosystem functioning, but that losing a generalist species might have 
disproportionate effects when there is low redundancy. Our under- 
standing of the mechanisms underlying the BEF relationship has now 
moved to a point where we cannot only distinguish among mechan- 
isms, but can also manipulate these mechanisms experimentally. 
Investigations should now turn to understanding the evolutionary pres- 
sures that maintain niche diversification in natural communities, along 
with the trade-offs involved, and their effect on the BEF relationship. 


METHODS SUMMARY 


We isolated 31 phage-free bacterial strains from coastal sea water sampled off the 
Bay of Blanes, Spain, on the basis of their morphologies. We sequenced the 16S 
ribosomal DNA genes of the ancestral strains to confirm that different taxa were 
used in the experiment (Supplementary Information, section 8). Each strain indi- 
vidually underwent selection on a different, single-carbon substrate of an EcoPlate 
to obtain specialists, and underwent selection on a highly mixed medium made 
from a mixture of all 31 EcoPlate carbon substrates to obtain generalists. We 
transferred the bacteria to a fresh medium every 48h during 64d of incubation 
at 20°C. After the selection period, we kept 20 lineages from among those that 
persisted and conducted a BEF experiment. We assembled microcosms with 
diversity levels, s, of 1, 2, 4, 5, 10 and 20 species for each of the three treatments 
(ancestral strains and specialist and generalist lineages). For each evolutionary 
treatment and diversity level, we created 20/s different assemblages by randomly 
selecting species from the species pool without replacement (for example, if s = 5 
we randomly assigned the 20 species to four assemblages)’, for a total of 42 
assemblages. We carried out this process independently four times, so there were 
a total of 168 different assemblages for each evolutionary treatment. Each assem- 
blage was replicated three times for a total of 1,512 microcosms (the product of 
three treatments, three replicates and 168 assemblages). We measured the light 
absorbance at 590nm after 48 and 72h to approximate productivity (reported 
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values were corrected by removing the average value of the blanks). We also 
conducted assays for each ancestral strain and the specialist and generalists 
lineages by incubating them on the 31 carbon substrates for 48h at 20°C. The 
assays were used to quantify generality and the niche diversity in the assemblages. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Bacterial isolation. Bacterial strains were isolated from coastal sea water sampled 
from the Bay of Blanes, Spain (40° 40’ N, 2° 50’ E) on 20-21 September 2007. Five 
replicate samples of 100 pl of sea water were spread on marine agar plates (BD 
Difco Marine Agar 2216; autoclaved for 20 min at 121 °C) and grown for 5d at 
12°C, which was the in situ temperature at sampling time. Ninety-five colonies 
with distinct morphotypes (that is, size, shape and colour) were isolated over four 
weeks, clean-streaked three times and frozen in glycerol at —80 °C. 

We then sequenced the 16S rDNA genes of the ancestral strains to confirm 
different taxa. A single colony of the 95 isolates was picked and dissolved in 10 pl 
TE (10nM Tris, ImMEDTA, pH 8.0) buffer, heated for 5min at 95°C and 
centrifuged briefly. The supernatant (1 ul) was used as the PCR template for 
16S rDNA gene amplification. The PCR reaction buffer (total volume, 25 11) 
contained 200M of each deoxynucleoside triphosphate in 10mM Tris-HCl 
(pH 9.0), 50 mM KCl, 1.5mM MgCh, 0.2 1M of primers 27F and 1492R*! and 
~2.5 units of puReTaq polymerase as included in the illustra puReTaq Ready-To- 
Go PCR beads kit (GE Healthcare). The PCR thermal cycling programme was as 
follows: 95 °C for 2 min; 30 cycles of 95 °C for 30s, 50 °C for 30 s and 72 °C for 45 s; 
and 72°C for 7min. The PCR products were sent to AGOWA Genomics for 
unidirectional sequencing using primers 27F and 519R*'. The quality of the 
sequences was controlled by removing traces of the sequencing primers by using 
PHRED” with a base-calling score of n = 20. Ambiguous base calls (that is, “N’s) at 
the ends of the sequences were also trimmed away. All sequences were analysed 
using the programs MALLARD* and CHECK_CHIMERA from the Ribosomal 
Database Project**. Neither program detected any chimaeras. Resulting sequences 
were then compared with the SILVA database” using the program BLAST”. A 
phylogenetic tree was built to infer the relationships among the ancestral strains 
and their closest known relatives. In combination with their morphological char- 
acteristics, we considered the 95 selected strains as different taxa (Supplementary 
Information, section 6). 

All 95 bacterial strains were tested for production of prophage (that is, to see 
whether they contained an inducible viral genome) on treatment with and without 
(control) the inducing agent mitomycin C ata final concentration of 1 pg ml? (ref 
31). Incubations were carried out in 96-well microplates for 24h. The growth 
kinetics of each strain was obtained by inoculating 10 ul of overnight cultures in 
200 pl marine broth culture medium (BD Difco Marine Broth 2216; autoclaved for 
20 min at 121 °C). Cultures were allowed to grow in the dark at 20°C for 48h. 
Changes in cell density were measured by the amount of light absorbance at 590-nm 
wavelength (FLUOstar OPTIMA spectrophotometer, BMG) every 15min. We 
calculated the difference in the absorbance at stationary phase between control 
and treated samples, and found a bimodal distribution (no effect and strong effect 
of mitomycin C) with a threshold corresponding to a 30% growth reduction by 
mitomycin C. We defined a strain containing cells with an inducible viral genome as 
a strain with a growth reduction larger than this threshold, and therefore consider 
strains under this threshold to be phage free. Of those strains that fell below the 30% 
threshold, we randomly selected 31 strains for the selection experiment. 
Selection experiment. BIOLOG EcoPlates contain 31 different carbon substrates 
(plus one blank) belonging to different chemical families. In addition to the carbon 
substrates, each well contains a fluorogenic tetrazolium dye (5 cyano-2,3 ditolyl 
tetrazolium chloride), which is reduced to a violet-fluorescent formazan molecule 
when the carbon source is oxidized. Colour development was measured spectro- 
photometrically at 590nm with a FLUOstar OPTIMA spectrophotometer and 
used as a proxy of metabolic activity’. Each of the 31 strains was used to establish 
three replicates of specialist and generalist selection lines. For the specialist treat- 
ment, each strain was assigned at random to one of the 31 carbon substrates. The 
EcoPlates used for the specialist treatment were prepared 2 h before each transfer 
with the addition of 140 pl of M9 minimal salts (0.1 g1~ 'NH,CL, 6 gl ™Na,sHPOu, 
3g] ' KH,PO,, 0.5g1 * NaCl) with salinity adjusted to 35.6 by the addition of 
NaCl, to match the salinity of the environment from which they were sampled. For 
the generalist treatment, we obtained a complex medium by mixing all of the 31 
carbon EcoPlate substrates. The EcoPlates were prepared as for the specialists and 
after 30 min the contents of the EcoPlates (except the blanks) were transferred into 
a sterile flask, mixed with an orbital shaker and redistributed across a 96-well 
sterile microplate. 

Each colony was initially grown for 24 h at 20 °C in 0.5 ml marine broth medium 
under constant orbital shaking. This solution (10 pl) was then used to inoculate 
each EcoPlate well from the two selection treatments. We intentionally did not 
wash the cells of marine broth medium to transfer a small quantity of this medium 
and assure the survival of the strains. Preliminary trials showed most of the strains 
did not initially survive in the EcoPlates without the marine broth. Consequently, 
for the first seven transfers we added a minimal quantity of marine broth, which 
we reduced from 7% (v/v) to 0% in steps of 1% at each transfer. The bacteria were 
incubated for 48 h in the dark at 20 °C in humid chambers. Bacteria were serially 
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transferred to maintain maximal growth rate and to renew the substrate. A transfer 
consisted of inoculating every well from a new plate with 10 ul of the correspond- 
ing previous well. Before each transfer, new EcoPlates were prepared as described 
above. The mixed medium was prepared at every tenth transfer and stocked at 
4 °C. The selection experiment ran for 32 transfers (several hundreds generations). 
At the thirty-second transfer, the contents of every well was amended with glycerol 
(50% v/v) and frozen at —80 °C. 

We measured light absorbance after conducting the selection experiment. 
Several lineages or replicates went extinct during the experiment, mostly for the 
specialist treatment. Therefore, we selected the most productive replicate among 
the lineages that survived the specialist treatment, and the corresponding lineages 
of the generalist treatment and the ancestors. We used 20 strains/lineages out of 
the 31 that were subject to the selection experiment for the BEF experiment. 
BEF experiment. We assembled random combinations of species at six levels of 
species richness for each of the three treatments (ancestors, evolved specialists and 
generalists). We used an experimental design that allowed separation of the effects 
of species richness and species composition’”*. The experimental design consisted 
of a set of 20/s microcosms, each with s species present. Within this set, the 
microcosm assemblages were constructed by sampling all of the 20 species without 
replacement. The construction of a system of microcosms was carried out inde- 
pendently four times. We chose values of s to be every factor of 20 (s = 1,2, 4,5, 10, 
20), so for any given s the number of assemblages considered was 4 X 20/s. Each 
assemblage was replicated three times, so in the experiment as a whole there were 
3X3X4X(1+2+4+5+10+ 20) = 1,512 microcosms. 

Bacterial communities were assembled in six sterile 96-well, 1-ml microplates. 
Bacteria were initially grown for 24h at 20°C in 6ml marine broth medium 
under constant orbital shaking in humid chambers. The cultures were centri- 
fuged (5 min at 3,500r.p.m.) and washed by eliminating the marine broth and 
adding 6 ml M9 minimal salts with buffered salinity. Because the cultures had 
different productivities in the marine broth, we first measured cell density by 
flow cytometry**. We adjusted the cell density to a concentration of 5 X 10> 
cells ml! with buffered M9. The bacteria were left in starvation for 2h before 
20 pl (40 ul for the monocultures) was inoculated into the appropriate wells. 
Once the cultures were distributed in the appropriate wells of the six plates, 10 ul 
of each community was transferred into three replicated microplates containing 
140 ul of the mixed medium (that is, the assemblages were initiated with 
5x10 ° cells), for a total of 18 microplates. The cultures were incubated at 
20°C in humid chambers for 72h. Light absorbance at 590 nm was measured at 
48 and 72h. 

Assays. Assays were conducted to measure strain/lineage performances on each 
carbon substrate at the end of the experiment. Before the assays, frozen cultures 
from the end of the selection period were reconditioned in 6 ml marine broth 
medium for 24h at 20°C in humid chambers under constant orbital shaking. The 
cultures were centrifuged (5 min at 3,500r.p.m.) and washed by removing the 
marine broth and adding M9 minimal salts with buffered salinity to adjust cell 
density to 3.3 X 10° cells ml (a pilot study showed that this concentration was 
optimal to obtain a signal differentiating strains). The cultures were left in star- 
vation for 2h. The EcoPlates were prepared with 120 ul of the buffered M9 solu- 
tion. The EcoPlates were incubated with 30 pl of culture. Each strain/lineage was 
incubated in triplicate at 20°C in humid chambers. Light absorbance at 590 nm 
was measured after 48 h. 

Statistical analyses. The selection treatment (three levels) and species richness 
(log-transformed) were entered into an analysis of covariance of the bacterial 
productivity. The dependent variable was the light absorbance at 590 nm and we 
analysed the effects of the experimental treatments after 48 and 72h. The results 
of the assays were averaged over the three replicates. A strain/lineage was con- 
sidered to be able to exploit a carbon substrate when the light absorbance was 
larger than the 95% of the distribution of the blanks. For each assemblage, we 
calculated the NDI, which is the total number of carbon substrates that a com- 
munity is able to exploit, calculated on the basis of the community composition 
and the individual ability of each strain/lineage to exploit the carbon substrates. 
A second analysis of covariance was conducted (excluding the ancestors) con- 
sidering the NDI instead of species richness. We also fitted a linear model that 
assessed the effect of species richness and species identity on ecosystem func- 
tioning without requiring knowledge of the contribution of individual species to 
ecosystem functioning in mixture (see ref. 26 for details of this methodology). 
The model returns species-specific coefficients that could be interpreted as the 
contribution of individual species to ecosystem functioning relative to the aver- 
age species. 
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Development of asymmetric inhibition underlying 
direction selectivity in the retina 


Wei Weil, Aaron M. Hamby’, Kaili Zhou! & Marla B. Feller’? 


Establishing precise synaptic connections is crucial to the develop- 
ment of functional neural circuits. The direction-selective circuit 
in the retina relies upon highly selective wiring of inhibitory inputs 
from starburst amacrine cells’ (SACs) onto four subtypes of ON- 
OFF direction-selective ganglion cells (DSGCs), each preferring 
motion in one of four cardinal directions’. It has been reported 
in rabbit that the SACs on the ‘null’ sides of DSGCs form func- 
tional GABA (y-aminobutyric acid)-mediated synapses, whereas 
those on the preferred sides do not*®. However, it is not known 
how the asymmetric wiring between SACs and DSGCs is estab- 
lished during development. Here we report that in transgenic mice 
with cell-type-specific labelling, the synaptic connections from 
SACs to DSGCs were of equal strength during the first postnatal 
week, regardless of whether the SAC was located on the preferred or 
null side of the DSGC. However, by the end of the second postnatal 
week, the strength of the synapses made from SACs on the null side 
of a DSGC significantly increased whereas those made from SACs 
located on the preferred side remained constant. Blocking retinal 
activity by intraocular injections of muscimol or gabazine during 
this period did not alter the development of direction selectivity. 
Hence, the asymmetric inhibition between the SACs and DSGCs is 
achieved by a developmental program that specifically strengthens 
the GABA-mediated inputs from SACs located on the null side, ina 
manner not dependent on neural activity. 

The ability to detect motion in the visual scene is a fundamental 
computation in the visual system that is first performed in the retina. 
Motion direction is encoded by DSGCs, which fire a maximum num- 
ber of action potentials during movement in their preferred direction, 
but fire minimally for movement in the opposite, or null, direction*”. 
In the mammalian retina, the directional preference of an ON-OFF 
DSGC is caused by asymmetric inhibitory inputs: movement in the 
null direction causes strong inhibition that effectively shunts light- 
evoked excitatory inputs. Indeed, blocking GABA , receptors abolishes 
the directionality of DSGCs by increasing spiking in response to null- 
direction motion®*. Null-side inhibition is thought to arise from SACs 
because their processes cofasciculate with DSGC dendrites”"®, where 
they form direct GABAergic synapses’, and because ablation of SACs 
eliminates the directional preference of DSGCs'*”. 

How SAC-DSGC synapses are organized to provide asymmetric 
inhibition has been an intriguing but difficult question because no 
apparent asymmetry is detected in the morphology or the distribution 
of synaptic markers in DSGCs and SACs'*"”’. The first and only piece of 
evidence for the synaptic basis of asymmetric inhibition came from a 
functional study between SAC and DSGC pairs in rabbit retina*, which 
suggested that SACs on the null side provide inhibitory inputs to the 
DSGCs but that those on the preferred side do not. Whether this asym- 
metric inhibition exists in the mouse is not known. In addition, because 
the directional preference of an ON-OFF DSGC is present by eye open- 
ing'*"* and the identification of DSGCs and their preferred directions is 
almost impossible before the onset of the light response, little is known 
about the developmental program that shapes the SAC-DSGC synapses. 


Here we use paired recordings and morphological reconstructions 
from a double-transgenic mouse line that selectively expresses two var- 
iants of green fluorescent protein (GFP) in SACs and nasal-preferring 
ON-OFF DSGCs (nDSGCs) to characterize the organization and the 
development of the precise wiring between SACs and DSGCs. These 
mice were generated by crossing two existing lines: Drd4-GFP mice, 
where Drd4 promoter-driven GFP expression is restricted to nDSGCs”’, 
and mGluR2-GFP mice (mGluR2 also known as Grm2), where a 
membrane-tethered human interleukin-20%/GFP fusion protein is 
expressed specifically in SACs in the retina (Fig. 1a)”°. 

To detect functional GABAergic synapses between SACs and DSGCs, 
we performed targeted whole-cell voltage-clamp recordings from SAC- 
DSGC pairs in whole-mount retinas. To isolate GABAergic synapses, 
paired recordings were carried out in the presence of drugs that block 
excitatory synaptic transmission (Fig. 1b). Alexa dyes were included in 
the recording pipettes to visualize the dendritic morphology of the 
recorded pairs (Fig. 1c). Only pairs with overlapping dendritic fields 
were used for analysis. 

Paired recordings were carried out in postnatal-day-4 (P4), P7, P14 
and adult mice. At P4, GABAergic currents elicited by SAC depolariza- 
tion were detected in nDSGCs in 64% of pairs (16 of 25 pairs; Fig. 1b, d), 
indicating that synapse formation between SACs and nDSGCs 
occurred before and during the first postnatal week, confirming pre- 
vious findings’®. By P7, nearly all pairs showed unitary GABAergic 
connections (P7: 85%, 29 of 34 pairs), and this high level of connectivity 
persisted into adulthood (P14-48: 91%, 41 of 45 pairs; Fig. 1d). The 
evoked response was completely blocked by the GABA, receptor 
antagonist gabazine (5 1M, n = 4; data not shown), indicating that 
the GABAergic transmission between SACs and nDSGCs is mediated 
by GABA, receptors. We note that the finding that connections were 
readily detected between SACs located on the preferred side of DSGCs 
in adult mice is in contrast to previous findings in rabbit’. 

Though SACs located on both the preferred side and the null side 
formed GABAergic synapses with DSGCs, a significant asymmetry in 
the unitary synaptic strength emerged along the null-preferred axis 
during the second postnatal week. Synaptic strength was quantified 
as the GABA,-receptor-mediated whole-cell conductance. These 
measurements were restricted to the null-side and preferred-side pairs 
that had similar amounts of overlap between SAC processes and 
DSGC dendrites. Unexpectedly, at P4 and P7 the GABAergic conduc- 
tances from both groups were similar (Fig. 2). However, a significant 
increase in unitary conductance was detected in the null-side pairs but 
not in the preferred side pairs in retinas at P14 and older (Fig. 2). 
Hence, the establishment of the direction-selective circuits is mediated 
by an asymmetric increase in the strength of the unitary conductance 
between SACs and DSGCs in the week before eye opening. 

The difference in GABAergic conductance from the null- and 
preferred-side SACs prompted us to examine two possibilities regarding 
the mechanisms underlying this strengthening. First we tested whether 
this functional asymmetry was correlated with the number or quality of 
contacts between SACs and nDSGCs, indicating a preferential adhesion 
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Figure 1 | nDSGCs receive direct GABAergic inputs from SACs located on 
the null and the preferred side from P4 until adult. a, Fluorescence image of 
the ganglion cell layer from a P30 Drd4-GFP/mGluR2-GFP mouse, showing 
the bright membrane-bound GFP expressed under the mGluR2 promoter in 
the SACs and the dim cytoplasmic GFP driven by the Drd4 promoter in the 
nDSGC. Scale bar, 25 jum. b, Paired whole-cell voltage-clamp recordings of 
GABAergic currents in a P4 nDSGC (lower traces) evoked by depolarization of 
a SAC from the null side (upper traces) in the presence of the NMDA (N- 
methyl-p-aspartate) receptor antagonist D(— )-2-amino-5-phosphonovaleric 
acid (AP5), «-amino-3-hydroxy-5-methyl-4-isoxazole propionic acid 
(AMPA)/kainate receptor antagonist 6,7-dinitroquinoxaline-2,3-dione 
(DNQX) and «4-containing nicotinic acetylcholine receptor antagonist 
dihydro-f-erythroidine (DHBE). SACs were depolarized from —60 to 0 mV, 


between SACs located on the null side and DSGCs’, which is an import- 
ant mechanism for dendritic differentiation and synaptogenesis in other 
systems”. After electrophysiological recording, the dendritic arbori- 
zations of the synaptically connected, Alexa-dye-filled SAC-nDSGC 
pairs from P14 to P48 were imaged live with a two-photon microscope 
and reconstructed using NEUROLUCIDA (Fig. 3a). We examined the 
overlapping region between the nDSGC dendrites and the distal portion 
(roughly the outer third) of the SAC processes enriched in varicosities, 
which are the sites of neurotransmitter release’. Crossing points 
between distal SAC processes and nDSGC dendrites were defined as 
‘contacts’ (Fig. 3a, inset). A subset of contacts exhibited cofascicula- 
tion’’°”*, which were defined as 2-11m segments along which the pro- 
cesses from the two cells remained in contact (Fig. 3a, inset). The 
null- and preferred-side SAC-nDSGC pairs showed a similar density 
of contacts (Fig. 3b) and cofasciculations (Fig. 3c). No asymmetry was 
found when all of the SAC processes were included in the above analysis 
(Supplementary Fig. 1). Therefore, the functional asymmetry in 
GABAergic synapses does not involve selective adhesion between 
null-side SAC processes and DSGC dendrites”. 

The second possibility we tested was whether spontaneous retinal 
activity during the second postnatal week has a role in the establishment 
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which reliably evoked an inward current in SACs. The postsynaptic GABAergic 
currents were recorded in DSGCs at different holding potentials to determine 
the current-voltage relationship of the conductance. c, Example images of 
synaptically connected, dye-filled SAC-DSGC pairs at P4, P7 and P30. The left- 
hand side shows pairs with SACs (green) located on the null side of the DSGCs 
(red). The right-hand side shows preferred-side pairs. Scale bar, 50 um. d, Soma 
locations of the GABAergically connected SAC-nDSGC pairs along the null- 
preferred axis during development. Red spots represent the positions of DSGC 
cell bodies. The positions of SAC cell bodies that form GABAergic synapses 
with their respective nDSGCs are shown as green spots; the SAC cell bodies that 
were not connected to nDSGCs are shown as grey spots. All pairs had 
overlapping dendritic fields. Scale bar, 25 jim. 


of direction selectivity. DSGCs are depolarized by retinal waves, and 
activity could therefore potentially influence the synapse strengthen- 
ing’*. To this end, we first confirmed that the GFP-labelled nDSGCs in 
the Drd4-GFP mice showed a clear preference for nasal motion at eye 
opening (Fig. 4a) that was sensitive to the GABA, receptor antagonist 
gabazine (Supplementary Fig. 2), with a direction selectivity index 
similar to those recorded in the adult (Fig. 4b). We then injected 
muscimol, a GABA, receptor agonist, intravitreally into Drd4-GFP 
mice to block all spontaneous and evoked neural activity in the retina”®. 
In the presence of muscimol, evoked synaptic transmission from SACs 
to nDSGCs and spontaneous activity in both cell types were completely 
suppressed (Fig. 4c and Supplementary Fig. 3a, b). The effectiveness 
of muscimol injection at blocking activity in vivo was confirmed by 
examining eye-specific segregation of retinogeniculate projections, 
which is an independent measure of retinal activity (Supplementary 
Fig. 3c, d), and the persistence of fluorescently labelled muscimol in 
the retina at 48 h post-injection (Supplementary Fig. 3e). 

Weassessed the responses of nDSGCs to stationary flashes and drift- 
ing gratings in P14-15 mice that had received repeated muscimol injec- 
tions in the second postnatal week. Muscimol treatment did not prevent 
the development of direction-selective responses or significantly reduce 
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Figure 2 | GABAergic conductance in the null-side SAC-nDSGC pairs 
strengthens during the second postnatal week. a, Postsynaptic GABAergic 
currents in nDSGCs recorded at holding potentials between —70 and -10 mV in 
response to depolarization (as in Fig. 1b) of null-side (left) and preferred-side 
(right) SACs at P4, P7, P14 and P30. b, Relative soma positions of SAC-nDSGC 
pairs used for conductance analysis at P4, P7 and P14-48. Open circles 
represent nDSGC cell bodies. Filled circles are SAC somas colour-coded for 
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conductance strength normalized to the maximum value across all ages. 
Dashed lines illustrate average dendritic arborization diameter, centred on the 
asterisks, for nDSGCs (white) and SACs (red; asterisks represent average soma 
locations). Scale bar, 25 um. c, Summary plot of GABAergic conductances of 
the null- and preferred-side SAC-nDSGC pairs at P4, P7 and P14-48. 
Individual pairs (black) and mean + s.d. (red) are shown. One-way analysis of 
variance: P< 0.0001; t-test: *P < 0.0001, **P = 0.0003. 
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Figure 3 | Dendritic contacts and cofasciculations between SACs and 
nDSGCs occur at similar densities for the null- and preferred-side pairs. 
a, NEUROLUCIDA reconstructions of the dendrites from the on sublamina 
and side views of the complete dendritic arborizations from a null-side (left) 
anda preferred-side (right) pair of SACs and nDSGCs. Dots represent dendritic 
contacts, with cofasciculation segments coloured white and the rest coloured 
purple. The GABAergic conductances for the null- and preferred-side pairs are 
indicated. Scale bar, 25 um. Inset, fluorescence image of the outlined region 
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showing crossing contacts (arrows) and cofasciculation (arrowhead). Scale bar, 
5 um. b, Summary plot of the density of total contacts between DSGCs and 
distal SAC processes (roughly the outer third) from the null or preferred side 
from P14 to P48. Individual pairs and mean + s.d. are shown. The data points 
for P28 and later are coloured blue, and the ones for before P28 are coloured 
black. c, Summary plot of the density of cofasciculations between nDSGCs and 
distal SAC processes from the same pairs as in b. Null- and preferred-side 
groups are not significantly different in b and c. P > 0.7, t-test. 
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Figure 4 | Intraocular injections of muscimol or gabazine do not alter 
direction selectivity in nDSGCs. a, The normalized spike vector sums of 
nDSGCs in response to drifting gratings of 12 directions from P14—15 Drd4- 
GFP mice that received either no treatment (control) or intraocular injections 
of saline, muscimol or gabazine from P6 to P12. D, dorsal; N, nasal; T, temporal; 
V, ventral. The red lines are mean vector sums of all cells in each group. Insets, 
examples of normalized tuning curves of single cells, with corresponding vector 
sums represented as red lines of nDSGCs from each group. Control: n = 4 mice, 
12 cells; saline: n = 11 mice, 43 cells; muscimol: n = 12 mice, 25 cells; gabazine: 
n= 4 mice, 17 cells. b, Summary plot of direction selectivity index (DSI) for 


directional tuning of nDSGCs (Fig. 4a, b). Normal ON and OFF light 
responses were also present in the muscimol-treated group, although 
there was an increase in the number of cells that did not respond to 
gratings in the muscimol-treated group (Supplementary Fig. 4). 

In visual cortex, activation of GABA, receptors is required for 
maturation of GABAergic synapses”. To test the hypothesis that 
GABA, receptor activation is required for the development of direction 
selectivity, we performed intravitreal injections of the GABA, receptor 
antagonist gabazine into Drd4—-GFP mice during the second postnatal 
week. Gabazine treatment did not prevent the development of 
direction-selective responses of GFP-positive cells to drifting gratings 
(Fig. 4a, b). Therefore, the development of direction selectivity arises 
independently of the activation of GABA, receptors. 

To begin exploring the synaptic basis of this increase in conductance 
between null-side SACs and DSGCs, we recorded the spontaneous 
inhibitory postsynaptic currents (IPSCs) from nDSGCs at P7 and 
P14. We found a significant increase in the frequency and no signifi- 
cant change in the amplitude of the GABAergic IPSCs, although there 
was a trend towards larger IPSC amplitudes at P14 (Supplementary 
Fig. 5). This result is consistent with the hypothesis that the stronger 
GABAergic unitary conductance for the null-side pairs is primarily 
due to increased numbers of functional GABAergic synapses. 
However, we cannot tell whether the spontaneous IPSCs originate 
from the null- or preferred-side SACs. Further study is required to 
determine the relative role increases in synapse number versus synapse 
strength have in the increase in the unitary conductance between null- 
side SACs and DSGCs. 

Our study demonstrates that asymmetric inhibition arises during 
the second postnatal week through selective strengthening of the 
GABAergic conductance from SACs on the null sides of DSGCs. 
Morphological analysis revealed a similar degree of dendritic contact 
and cofasciculation between SACs on the null or preferred side, indi- 
cating that the synapse development is dissociated from physical 
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adult (>P28), P14-15 untreated, saline, muscimol and gabazine-treated 
groups. Bars show mean + s.e.; open circles represent individual cells. Adult 
data are reproduced from ref. 19. c, Example traces from whole-cell voltage- 
clamp recordings of inhibitory (upper traces, Vy = 0 mV) and excitatory 
(lower traces, Vj; = —50 mV) currents from a P14 nDSGC in drug-free 
artificial cerebrospinal fluid (control, left) or artificial cerebrospinal fluid 
containing 100 14M muscimol (right). Deflections from baseline correspond to 
spontaneous synaptic currents. At depolarized potentials, application of 
muscimol activated a tonic current, which was measured as a change in the 
baseline holding current”®. 


encounters between SAC processes and DSGC dendrites, as was 
recently found in barrel cortex”*. 

In addition, we found that blocking depolarization-induced activity 
or GABAg receptor activation did not affect the establishment of 
direction selectivity in the retina, in sharp contrast to direction-selective 
cells in the visual cortex”’. This finding lends support to previous studies 
showing that early visual experience’*’* or cholinergic retinal waves'® 
are not involved in establishing retinal direction selectivity. Therefore, 
the mechanism underlying the development of retinal direction selec- 
tivity is an asymmetric increase in the strength of the inhibitory unitary 
conductance between SACs and DSGCs in the week before eye opening, 
without the establishment of asymmetrical dendritic contacts and inde- 
pendent of spontaneous neural activity. 


METHODS SUMMARY 


We performed dual whole-cell voltage-clamp recordings from SAC-nDSGC pairs in 
oxygenated artificial cerebrospinal fluid at 32-34 °C containing 119.0mM NaCl, 
26.2mM NaHCO;, 11mM glucose, 2.5mM KCl, 1.0mM K,HPO,, 2.5mM 
CaCh, 1.3mM MgCl, 0.05mM AP5, 0.02mM DNQX and 0.008mM DHBE. 
Recording electrodes of 3-5 MQ were filled with an internal solution containing 
110 mM CsMeSOg, 2.8 mM NaCl, 4mM EGTA, 5mM TEA-Cl, 4mM adenosine 
5'-triphosphate (magnesium salt), 0.3 mM guanosine 5’-triphosphate (trisodium 
salt), 20mM HEPES and 10mM phosphocreatine (disodium salt), 0.025 mM 
Alexa 488 (for SACs) and 0.025mM Alexa 594 (for nDSGCs), pH 7.2. Data were 
acquired using PCLAMP 10 recording software and a Multiclamp 700A amplifier 
(Molecular Devices). The GABAergic conductance was calculated from the linear 
portion of the current-voltage curve for the SAC-evoked currents in nDSGCs. After 
recording, we imaged the dye-filled SACs and the nDSGCs using a custom-modified 
two-photon microscope as described previously’? (FluoView 300, Olympus 
America) at 745 nm. Images were acquired at z intervals of 0.5 jum using a X60 
objective (Olympus LUMPlanFI/IR <60/0.90W). SAC and nDSGC processes were 
reconstructed from image stacks with NEUROLUCIDA. For in vivo injections, we 
anaesthetized animals with 3.5% isoflurane/2% Op. The eyelid was then opened with 
fine forceps, and 1 pl of 10 mM muscimol, 500 LM gabazine or saline was injected 
using a fine glass micropipette. Injections were made with a picospritzer (World 
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Precision Instruments) generating 20-p.s.i., 3-ms-long positive pressure. We 
repeated this procedure every 48 h, starting at P6é and ending at P12. Whole-mount 
retina preparation and two-photon targeted recording for light responses was per- 
formed according to previously described techniques*’. The direction selectivity 
index was computed as previously described’. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 


Received 31 May; accepted 13 October 2010. 
Published online 5 December 2010. 


1. 


2. 


Euler, T., Detwiler, P. B. & Denk, W. Directionally selective calcium signals in 
dendrites of starburst amacrine cells. Nature 418, 845-852 (2002). 

Demb, J. B. Cellular mechanisms for direction selectivity in the retina. Neuron 55, 
179-186 (2007). 

Fried, S. ., Munch, T. A. & Werblin, F. S. Mechanisms and circuitry underlying 
directional selectivity in the retina. Nature 420, 411-414 (2002). 

Barlow, H. B. & Levick, W. R. The mechanism of directionally selective units in 
rabbit’s retina. J. Physiol. (Lond.) 178, 477-504 (1965). 

Oyster, C. W. The analysis of image motion by the rabbit retina. J. Physiol. (Lond.) 
199, 613-635 (1968). 

Ariel, M. & Daw, N. W. Pharmacological analysis of directionally sensitive rabbit 
retinal ganglion cells. J. Physiol. (Lond.) 324, 161-185 (1982). 

Kittila, C. A. & Massey, S. C. Effect of ON pathway blockade on directional selectivity 
in the rabbit retina. J. Neurophysiol. 73, 703-712 (1995). 

Weng, S., Sun, W. & He, S. Identification of ON-OFF direction-selective ganglion 
cells in the mouse retina. J. Physiol. (Lond.) 562, 915-923 (2005). 

Famiglietti, E. V. Synaptic organization of starburst amacrine cells in rabbit retina: 
analysis of serial thin sections by electron microscopy and graphic reconstruction. 
J. Comp. Neurol. 309, 40-70 (1991). 


. Stacy, R. C. & Wong, R. O. Developmental relationship between cholinergic 


amacrine cell processes and ganglion cell dendrites of the mouse retina. J. Comp. 
Neurol. 456, 154-166 (2003). 


. Yoshida, K. et al. A key role of starburst amacrine cells in originating retinal 


directional selectivity and optokinetic eye movement. Neuron 30, 771-780 
(2001). 


. Amthor, F.R., Keyser, K. T. & Dmitrieva, N. A. Effects of the destruction of starburst- 


cholinergic amacrine cells by the toxin AF64A on rabbit retinal directional 
selectivity. Vis. Neurosci. 19, 495-509 (2002). 


. Chen, Y.C.& Chiao, C.C. Symmetric synaptic patterns between starburst amacrine 


cells and direction selective ganglion cells in the rabbit retina. J. Comp. Neurol. 508, 
175-183 (2008). 


. Famiglietti, E. V. A structural basis for omnidirectional connections between 


starburst amacrine cells and directionally selective ganglion cells in rabbit retina, 
with associated bipolar cells. Vis. Neurosci. 19, 145-162 (2002). 


. Jeon, C. J. et al. Pattern of synaptic excitation and inhibition upon direction- 


selective retinal ganglion cells. J. Comp. Neurol. 449, 195-205 (2002). 


. Chan, Y. C. & Chiao, C. C. Effect of visual experience on the maturation of ON-OFF 


direction selective ganglion cells in the rabbit retina. Vision Res. 48, 2466-2475 
(2008). 


. Chen, M., Weng, S., Deng, Q., Xu, Z. & He, S. Physiological properties of direction- 


selective ganglion cells in early postnatal and adult mouse retina. J. Physiol. (Lond.) 
587, 819-828 (2009). 


18. 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


29. 


30. 


LETTER 


Elstrott, J. et al. Direction selectivity in the retina is established independent of 
visual experience and cholinergic retinal waves. Neuron 58, 499-506 (2008). 
Huberman, A. D. et a/. Genetic identification of an On-Off direction-selective retinal 
ganglion cell subtype reveals a layer-specific subcortical map of posterior motion. 
Neuron 62, 327-334 (2009). 

Watanabe, D. et al. Ablation of cerebellar Golgi cells disrupts synaptic integration 
involving GABA inhibition and NMDA receptor activation in motor coordination. 
Cell 95, 17-27 (1998). 
Togashi, H. et al. Cadherin regulates dendritic spine morphogenesis. Neuron 35, 
77-89 (2002). 
Zhu, H. & Luo, L. Diverse functions of N-cadherin in dendritic and axonal terminal 
arborization of olfactory projection neurons. Neuron 42, 63-75 (2004). 
Dong, W., Sun, W., Zhang, Y., Chen, X. & He, S. Dendritic relationship between 
starburst amacrine cells and direction-selective ganglion cells in the rabbit retina. 
J. Physiol. (Lond.) 556, 11-17 (2004). 
Vaney, D. |., Collin, S. P. & Young, H. M. in Neurobiology Of The Inner Retina (eds 
Weiler, R. & Osborne, N. N.) 157-168 (Springer, 1989). 
Elstrott, J. & Feller, M. B. Direction-selective ganglion cells show symmetric 
participation in retinal waves during development. J. Neurosci. 30, 11197-11201 
(2010). 

Wang, C. T. et al. GABA(A) receptor-mediated signaling alters the structure of 
spontaneous activity in the developing retina. J. Neurosci. 27, 9130-9140 (2007). 
Huang, Z. J. Activity-dependent development of inhibitory synapses and 
innervation pattern: role of GABA signalling and beyond. J. Physiol. (Lond.) 587, 
1881-1888 (2009). 

Petreanu, L., Mao, T., Sternson, S. M. & Svoboda, K. The subcellular organization of 
neocortical excitatory connections. Nature 457, 1142-1145 (2009). 

Li, Y., Van Hooser, S. D., Mazurek, M., White, L. E. & Fitzpatrick, D. Experience with 
moving visual stimuli drives the early development of cortical direction selectivity. 
Nature 456, 952-956 (2008). 
Wei, W., Elstrott, J. & Feller, M. B. Two-photon targeted recording of GFP-expressing 
neurons for light responses and live-cell imaging in the mouse retina. Nature 
Protocols 5, 1347-1352 (2010). 


Supplementary Information is linked to the online version of the paper at 


www.nature.com/nature. 


Acknowledgements We thank S. 


akanishi for mGluR2-GFP mice, A. Huberman for 


Drd4—GFP mice, J. Elstrott for help with MATLAB software, X. Han for mouse genotyping, 
J. Ledue for imaging assistance and A. Blankenship for reading the manuscript. This 


work was supported by grants RO 


EY013528 and ARRA EY019498 from the National 


Institutes of Health. 


Author Contributions W.W. conducted the electrophysiology and imaging 
experiments, and manuscript preparation; A.M.H. conducted intraocular injections, 
analysis of retinogeniculate projection patterns and manuscript preparation. K.Z. 
conducted NEUROLUCIDA reconstructions and analysis. M.B.F. was involved in the 
experimental design, data analysis of Supplementary Fig 3c-e and manuscript 
preparation. 


Author Information Reprints and permissions information is available at 
www.nature.com/reprints. The authors declare no competing financial interests. 
Readers are welcome to comment on the online version of this article at 
www.nature.com/nature. Correspondence and requests for materials should be 
addressed to M.B.F. (mfeller@berkeley.edu). 


00 MONTH 2010 | VOL 000 | NATURE | 5 


©2010 Macmillan Publishers Limited. All rights reserved 


LETTER 


METHODS 

Mice. Drd4-GFP mice in the Swiss Webster background were obtained from 
MMRRC” (http://www.mmrrc.org/strains/231/0231.html), and mGluR2-GFP 
mice were a gift from Shigatada Nakanishi, Osaka. Both strains were backcrossed 
to the C57BL/6 background in our laboratory. The Drd4-GFP/mGluR2-GFP 
double-transgenic mice were obtained by crossing the two single-transgenic lines. 
Whole-cell patch-clamp recording. Single or dual whole-cell voltage-clamp record- 
ings from SACs and nDSGCs were performed in oxygenated artificial cerebrospinal 
fluid at 32-34 °C containing 119.0mM NaCl, 26.2mM NaHCO;, 11 mM glucose, 
2.5mM KCl, 1.0mM K,HPO,, 2.5mM CaCh, 1.3mM MgCh, 0.05mM APS, 
0.02 mM DNQX and 0.008mM DHBE. Recording electrodes of 3-5 MQ were 
filled with an internal solution containing 110mM CsMeSO,, 2.8mM NaCl, 
4mM EGTA, 5 mM TEA-Cl, 4mM adenosine 5’-triphosphate (magnesium salt), 
0.3mM guanosine 5'-triphosphate (trisodium salt), 20 mM HEPES and 10 mM 
phosphocreatine (disodium salt), 0.025 mM Alexa 488 (for SACs) and 0.025 mM 
Alexa 594 (for nDSGCs), pH 7.25. Data were acquired using PCLAMP 10 record- 
ing software and a Multiclamp 700A amplifier (Molecular Devices), filtered at 
4kHz and digitized at a sampling rate of 10 kHz. The GABAergic whole-cell 
conductance was calculated from the linear portion of the current-voltage curve 
for the SAC-evoked currents in nDSGCs and analysed using MATLAB software. 
Two-photon targeted loose-patch recording of GFP-positive neurons for light 
response. Drd4-GFP mice were anaesthetized with isofluorane and decapitated in 
accordance with the UC Berkeley Institutional Animal Care and Use Committee 
and in conformance with the NIH Guide for the Care and Use of Laboratory 
Animals, the Public Health Service Policy and the SFN Policy on the Use of 
Animals in Neuroscience Research. Under infrared illumination, retinas were iso- 
lated from the pigment epithelium in oxygenated Ames’ medium (Sigma), cut into 
dorsal and ventral halves, and mounted over a hole of 1-1.5 mm‘ on filter paper 
(Millipore) with the photoreceptor layer facing down. Retinas were kept in darkness 
at 25°C in Ames’ medium bubbled with 95% O2/5% CO2 until use (0-7h). 
Recording electrodes of 3-5 MQ were filled with Ames’ medium. GFP fluorescence 
was detected with a custom-built, FluoView-based two-photon microscope and a 
Ti:sapphire laser (Coherent) tuned to 920 nm, a wavelength that minimally acti- 
vates mouse photoreceptors and therefore preserves light response. GFP cells were 
then targeted for loose-patch recordings using PCLAMP 10 recording software and 
a Multiclamp 700A amplifier. 


Visual stimuli were generated as previously described’. Briefly, a white, mono- 
chromatic organic light-emitting display (OLEDXL, eMagin; 800 x 600 pixel 
resolution, 85-Hz refresh rate) was controlled by an Intel Core Duo computer 
with a Windows XP operating system. Drifting square-wave gratings (spatial 
frequency, 225,1m per cycle; temporal frequency, 4cycless '; 30°s ' in 12 
pseudorandomly chosen directions spaced at 30 intervals, with each presentation 
lasting 3 s and followed by 500 ms of grey screen) were generated from the OLED 
using MATLAB and the Psychophysics Toolbox, and were projected through the 
X60 water-immersion objective (LUMPlanFI/IR, NA 0.9) via the side port of the 
microscope, centred on the soma of the recorded cell and focused on the photo- 
receptor layer. Loose-patch recordings were obtained during the stimulus pre- 
sentation and analysed using MATLAB. A detailed, step-by-step protocol of the 
two-photon targeted recording of light response can be found in ref. 30. 
Two-photon microscopy and morphological reconstruction. After paired 
recording, the Alexa-488-filled SACs and the Alexa-594-filled nDSGCs in the 
Drd4-GFP/mGluR2-GFP mice were imaged using the two-photon microscope 
at 745 nm. At this wavelength, GFP is not efficiently excited but both Alexa 488 
and Alexa 594 are brightly fluorescent. Therefore, the morphology of the Alexa- 
488-filled SACs could be distinguished from the very weak GFP fluorescence. 
Image stacks were acquired at z intervals of 0.5 um and resampled three times 
for each stack using a X60 objective (Olympus LUMPlanFI/IR 60/0.90W), 
covering the entire dendritic fields of the SACs and nDSGCs. Image stacks from 
25 SAC-nDSGC pairs were then imported into NEUROLUCIDA (MBF 
Biosciences) and reconstructed in three dimensions. The densities of contacts 
and cofasciculations were measured from the reconstructions. 

Intraocular injections. Drd4-GFP animals were anaesthetized with 3.5% isoflurane/ 
2% Op». The eyelid was then opened with fine forceps, and 1 ul of 10 mM muscimol 
(Tocris), 500 4M gabazine (Tocris) or saline was injected using a fine glass micro- 
pipette. Injections were made with a picospritzer (World Precision Instruments) 
generating 20-p.s.i., 3-ms-long positive pressure. To prevent efflux of the injected 
solution, removal of the pipette tip from the eye was done slowly and gentle pressure 
was then applied to the injection site with a sterile cotton swab for ~10s. This 
procedure was repeated every 48 h, starting at P6 and ending at P12. 

Statistical analysis. Grouped data are presented as mean + s.d. or s.e.m. as indi- 
cated. Data sets were tested for normality, and statistical differences were exam- 
ined using one-way analysis of variance and post hoc comparisons using Student’s 
t-test with Bonferroni corrections (MATLAB). 
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A high C/O ratio and weak thermal inversion in the 
atmosphere of exoplanet WASP-12b 
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The carbon-to-oxygen ratio (C/O) in a planet provides critical 
information about its primordial origins and subsequent evolution. 
A primordial C/O greater than 0.8 causes a carbide-dominated 
interior, as opposed to the silicate-dominated composition found 
on Earth’; the atmosphere can also differ from those in the Solar 
System’. The solar C/O is 0.54 (ref. 3). Here we report an analysis of 
dayside multi-wavelength photometry** of the transiting hot- 
Jupiter WASP-12b (ref. 6) that reveals C/O = 1 in its atmosphere. 
The atmosphere is abundant in CO. It is depleted in water vapour 
and enhanced in methane, each by more than two orders of mag- 
nitude compared to a solar-abundance chemical-equilibrium model 
at the expected temperatures. We also find that the extremely irra- 
diated atmosphere (T > 2,500 K) of WASP-12b lacks a prominent 
thermal inversion (or stratosphere) and has very efficient day-night 
energy circulation. The absence of a strong thermal inversion is in 
stark contrast to theoretical predictions for the most highly irra- 
diated hot-Jupiter atmospheres’ ’. 

The transiting hot Jupiter WASP-12b orbits a star slightly hotter 
than the Sun (6,300K) in a circular orbit at a distance of only 
0.023 astronomical units (AU), making it one of the hottest exoplanets 
known’. Thermal emission from the dayside atmosphere of WASP- 
12b has been reported using the Spitzer Space Telescope”, at 3.6 Lum, 
4.5 tum, 5.8 um and 8 jum wavelengths’, and from ground-based obser- 
vations in the J (1.2 um), H (1.6 jtm) and Ks (2.1 jum) bands? (Fig. 1). 

The observations provide constraints on the dayside atmospheric 
composition and thermal structure, based on the dominant opacity 
source in each bandpass. The J, H and Ks channels* have limited 
molecular absorption features, and hence probe the deep layers of 
the planetary atmosphere, at pressure P ~ 1 bar, where the temper- 
ature T~ 3,000K (Fig. 1). The Spitzer observations‘, on the other 
hand, are excellent probes of molecular composition. CH, has strong 
absorption features in the 3.6-jum and 8-um channels, CO has strong 
absorption in the 4.5-j1m channel, and H,O has its strongest feature in 
the 5.8-um channel and weaker features in the 3.6-[1m, 4.5-um and 
8-um channels. The low brightness temperatures in the 3.6-um 
(2,700 K) and 4.5-j1m (2,500 K) channels, therefore, clearly suggest 
strong absorption due to CH, and CO, respectively. The high bright- 
ness temperature in the 5.8-11m channel, on the other hand, indicates 
low absorption due to H,O. The strong CO absorption in the 4.5-lm 
channel also indicates temperature decreasing with altitude, because a 
thermal inversion would cause emission features of CO in the same 
channel with a significantly higher flux than at 3.6 lm (refs 11 and 12). 

The broadband observations allow us to infer the chemical com- 
position and temperature structure of the dayside atmosphere of 
WASP-12b using a statistical retrieval technique’. We combined a 
one-dimensional atmosphere model with a Markov-chain Monte 
Carlo sampler!" that computes over 4 X 10° models to explore the 


parameter space. The phase space included thermal profiles with and 
without inversions, and equilibrium and non-equilibrium chemistry 
over a wide range of atomic abundances. Our models include the 
dominant sources of infrared opacity in the temperature regime of 
WASP-12b (refs 14, 15 and 16): H,O, CO, CHy, CO;, H,-H, col- 
lision-induced absorption, and TiO and VO where the temperatures 
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Figure 1 | Observations and model spectra for dayside thermal emission of 
WASP-12b. F is the flux. The black filled circles with error bars show the data 
with 1 s.d. errors: four Spitzer observations’ (3.6 jim, 4.5 jim, 5.8 jm and 8 um), 
and three ground-based observations in the J (1.2 um), H (1.6 um), and Ks 
(2.1 jum) bands*. Four models fitting the observations are shown in the coloured 
solid curves in the main panel, and the coloured circles are the channel- 
integrated model points. The corresponding temperature profiles are shown in 
the inset. The molecular compositions are shown as number ratio with respect 
to molecular hydrogen; all the models have C/O between 1 and 1.1. The thin 
grey dotted lines show the blackbody spectra of WASP-12b at 2,000 K (bottom), 
2,500 K (middle) and 3,000 K (top). A Kurucz model”’ was used for the stellar 
spectrum, assuming uniform illumination over the planetary disk (that is, 
weighted by 0.5; ref. 7). The black solid lines at the bottom show the 
photometric band-passes in arbitrary units. The low fluxes at 3.6 1m and 

4.5 ium are explained by methane and CO absorption, respectively, required for 
all the models that fit. The high flux in the 5.8-1m channel indicates less 
absorption due to H,O. The observations can be explained to high precision by 
models without thermal inversions. Models with strong thermal inversions are 
ruled out by the data (see Fig. 3). The green model features a thermal inversion 
at low pressures (P < 0.01 bar), but the corresponding spectrum is almost 
indistinguishable from the purple model, which does not have a thermal 
inversion; both models have identical compositions and identical thermal 
profiles for P > 0.01 bar. Thus, any potential thermal inversion is too weak to be 
detectable by current instruments. 
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are high enough for them to exist in the gas phase”"”. The host star has 
a significantly enhanced metallicity (2 X solar)*, and evolutionary pro- 
cesses can further enhance the abundances'*”; Jupiter has 3 X solar 
C/H (ref. 18). Our models therefore explore wide abundance ranges: 
0.01-100 X solar for C/H and O/H, and 0.1-10 X solar for C/O. 
Figure 2 shows the mixing ratios of H,O, CO, CH, and CO, and the 
ratios of C/H, O/H and C/O required by the models at different levels 
of fit. Figure 3 presents the temperature profiles. 

We find a surprising lack of water and overabundance of methane 
(Fig. 2). At 2,000-3,000 K, assuming solar abundances yields CO and 
H,0 as the dominant species besides H2 and He (refs 15 and 16). Most 
of the carbon, and the same amount of oxygen, are present in CO, and 
some carbon exists as CH,. The remaining oxygen in a hydrogen- 
dominated atmosphere is mostly in HO; small amounts are also pre- 
sent in species such as CO2. The CO/H; and H,O/H) mixing ratios 
should each be >5 X 107 *, CH,/H) should be <10~ 8, and CO,/H, 
should be about 10 *, under equilibrium conditions at a nominal 
pressure of 0.1 bar. The requirement of H,O/H2 =6 X 10 ° and 
CH,/H, = 8 X 10 ° (both at 3a, 99.73% significance; Fig. 2) is there- 
fore inconsistent with equilibrium chemistry using solar abundances. 

The observations place a strict constraint on the C/O ratio. We 
detect C/O = 1 at 30 significance (Fig. 2). Our results rule out a solar 
C/O of 0.54 at 4.2¢. Our calculations of equilibrium chemistry’*”° 
using C/O = 1 yield mixing ratios of HyO, CO and CH, that are 
consistent with the observed constraints. We find that, for C/O = 1, 
H,0 mixing ratios as low as 10’ and CH, mixing ratios as high as 
10 ° can be attained at the 0.1-1 bar level for temperatures around 
2,000 K and higher. And, although the CO mixing ratio is predicted to 
be >10 *, making it the dominant molecule after Hj and He, CO, is 
predicted to be negligible (<10 °). The theoretical predictions for a 


b 1071 


[H,0)/[H,] 


C/O = 1 atmosphere are consistent with the observed constraints on 
H,O, CHy, CO and CO, (Fig. 2). 

The observations rule out a strong thermal inversion deeper than 
0.01 bar (Fig. 3). Thermal inversions at lower pressures have opacities 
too low to induce features in the emission spectrum that current 
instruments can resolve. For comparison’'*", all stratospheric inver- 
sions in Solar System giant planets, and those consistent with hot- 
Jupiter observations, exist at pressures between 0.01 bar and 1 bar. 
The major contributions to all the observations come from the lower 
layers of the atmosphere, P > 0.01 bar, where we rule out a thermal 
inversion (Supplementary Fig. 1). The observations also suggest very 
efficient day-night energy redistribution (Fig. 2). The low brightness 
temperatures at 3.6 pm and 4.5 [tm imply that only part of the incident 
stellar energy is re-radiated from the dayside, whereas up to 45% is 
absorbed and redistributed to the nightside. The possibility of a deep 
thermal inversion and inefficient redistribution was suggested 
recently’, based on observations in the J, H and Ks channels, but the 
Spitzer observations rule out both conditions. 

The lack of a prominent thermal inversion contrasts with recent 
work that designates WASP-12b as a member of the class of very hot 
Jupiters that are expected to host inversions’”*. At T’ > 2,000 K, mol- 
ecules such as TiO and VO, which are strong absorbers in the ultra- 
violet/visible, are expected to be in gas phase and potentially cause 
thermal inversions’. WASP-12b, now being the hottest planet without 
a distinct inversion, presents a major challenge to existing atmospheric 
classification schemes for exoplanets based on thermal inversions”. 
Although there are hints of low chromospheric activity* in the host star, 
it remains to be seen whether the high incident continuum ultraviolet 
flux expected for WASP-12b might be efficient in photo-dissociating 
inversion-causing compounds, thus explaining the lack of a strong 
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Figure 2 | Constraints on the atmospheric composition of WASP-12b. 

a-e The distributions of models fitting the seven observations (Fig. 1) at 
different levels of 7° are shown. The coloured dots show 7” surfaces, with each 
dot representing a model realization. The purple, red, green, blue and black 
colours correspond to models with va less than 7, 14, 21 and 28 and x? > 28, 
respectively (y” ranges between 4.8-51.3). Mixing ratios are shown as ratios by 
number with respect to H2. At 30 significance, the constraints on the 
composition are H,O/H = 6 X 10 °, CH,/H, =8 X 10 °, CO/ 

H,=6X 10°, CO./H, <5 X10 °, and C/O> 1. The compositions of the 
best-fitting models (with 7? <7) span H,O/H; = 5 X 10 |! to6 X 10 °, CO/ 
H, =3X 10° to3X 10°, CHy/H» = 4X 10 ° to 8 X 10 * and CO,/ 
H,=2X10 ’to7 X10 °; the corresponding ranges in C/O and elemental 
abundances are C/O = 1 to 6.6, C/H=2 X10 °to10 * andO/H=2X10° 
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to 10 °. The constraints on the C/H and O/H ratios are governed primarily by 
the constraints on CO, which is the dominant molecule after H, and He. On the 
basis of thermochemical equilibrium, the inferred CH4/H, and H,O/H, mixing 
ratios are possible only for C/O = 1, consistent with our detection of C/O = 1. 
f, Constraints on the day-night energy redistribution'’, given by 

(1 — A)(1 —f,), where A is the bond albedo and f, is the fraction of incident 
energy redistributed to the nightside. Up to f, = 0.45 is possible (for A = 0). 
Thus, the observations support very efficient redistribution. An additional 
observation in the z’ (0.9 um) band was reported recently’. However, the 
observation implies a value for the orbital eccentricity inconsistent with other 
data in the literature**. We therefore decided to exclude this observation from 
the analysis presented here, although including it does not affect our 
conclusions regarding the value of C/O or the temperature structure. 
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Figure 3 | Thermal profiles of WASP-12b. The solid thin lines show profiles 
at different degrees of fit (description of colours is as in Fig. 2); only 100 
randomly chosen profiles for each 7’ level are shown, for clarity. The thick black 
solid curve in the front shows a published profile from a self-consistent model 
of WASP-12b with a thermal inversion, adapted from ref. 17, which assumes 
solar abundances. The thick black dashed curve shows the same model but 
without a thermal inversion. Ifa thermal inversion is present in WASP-12b, it is 
expected to be prominent, as shown by the thick solid black curve. A prominent 
thermal inversion between 0.01 bar and 1 bar is ruled out by the data at 40. The 
ostensibly large inversions in the figure are at low pressures (below 0.01 bar), 
which have low optical depths, and hence minimal influence on the emergent 
spectrum (see Fig. 1). The observations are completely consistent with thermal 
profiles having no inversions. Small thermal inversions are also admissible by 
the data, and could potentially result from dynamics. The thin black lines show 
the condensation curves of TiO at solar (dotted), 0.1 X solar (dashed) and 
10 X solar (dash-dotted) compositions”. 


inversion’. Alternatively, the amount of vertical mixing might be insuf- 
ficient to keep TiO/VO aloft in the atmosphere to cause thermal inver- 
sions’’. A C/O = 1 might also yield lower TiO/VO than that required to 
cause a thermal inversion. It is unlikely that the TiO/VO in WASP-12b 
might be lost to cold traps'’, given the high temperatures in the deep 
atmosphere on the dayside and nightside. 

If high C/O ratios are common, then the formation processes and 
compositions of extrasolar planets are probably very different from 
expectations based on Solar System planets. The host star has super- 
solar metallicity but initial analyses find its C/O consistent with 
solar®**, In the core accretion model, favoured for the formation of 
Jupiter, icy planetesimals containing heavy elements coalesce to form 
the core, followed by gas accretion'®*. The abundances of elemental 
oxygen and carbon are enhanced equally’*”’, maintaining a C/O like 
the star’s. If the host star had a C/O = 1, then the C/O we detect in 
WASP-12b would have been evident. However, if the stellar C/O is 
indeed <1, then the C/O enhancement in WASP-12b’s atmosphere 
would suggest either an unexpected origin for the planetesimals, a local 
overdensity of carbonaceous grains”, or a different formation mech- 
anism entirely. Although carbon-rich giant planets like WASP-12b 
have not been observed, theory predicts myriad compositions for 
carbon-dominated solid planets’”. Terrestrial-sized carbon planets, 
for instance, could be dominated by graphite or diamond interiors, 
as opposed to the silicate composition of Earth'”. If carbon dominates 
the heavy elements in the interior of a hot Jupiter, estimates of mass 
and radius could change compared to those based on solar abundances. 
Future interior models” should investigate the contribution of high 
C/O to the large radius of WASP-12b: that is, 1.75 Jupiter radii for 1.4 
Jupiter masses (ref. 6). 

The observed molecular abundances in the dayside atmosphere of 
WASP-12b motivate the exploration of a new regime in atmospheric 
chemistry. It remains to be seen whether photochemistry in WASP-12b 
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can significantly alter the composition in the lower layers of the atmo- 
sphere at P = 0.01-1 bar; these layers contribute most to the observed 
spectrum (Supplementary Fig. 1). Explaining the observed composition 
as a result of photochemistry with solar abundances would be challen- 
ging. CH, is more readily photodissociated than HO (refs 9 and 27), 
and hence a depletion of CH, over that predicted with solar abundances 
might be expected, as opposed to the observed enhancement of CH4. 
Apart from the spectroscopically dominant molecules considered in 
this work, minor species such as OH, C,H, and FeH (refs 27 and 28), 
which are not detectable by current observations, could potentially be 
measured with high-resolution spectroscopy in the future. Detection of 
these species would allow additional constraints on equilibrium and 
non-equilibrium chemistry in WASP-12b, although their effect on the 
C/O would be negligible. Models of exoplanetary atmospheres have 
typically assumed solar abundances and/or solar C/O, thereby explor- 
ing a very limited region of parameter space”’*”°. Data sufficient for a 
meaningful constraint on C/O exist for only a few exoplanets. That this 
initial C/O statistical analysis has C/O = 1 potentially indicates a wide 
diversity of planetary compositions. 
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Distributed biological computation with 
multicellular engineered networks 


Sergi Regot'*, Javier Macia?*, Nuria Conde’?, Kentaro Furukawa®, Jimmy Kjellén’, Tom Peeters’, Stefan Hohmann’, 


Eulalia de Nadal’, Francesc Posas! & Ricard Solé?*> 


Ongoing efforts within synthetic and systems biology have been 
directed towards the building of artificial computational devices’ 
using engineered biological units as basic building blocks”*. Such 
efforts, inspired in the standard design of electronic circuits*’, are 
limited by the difficulties arising from wiring the basic computa- 
tional units (logic gates) through the appropriate connections, 
each one to be implemented by a different molecule. Here, we show 
that there is a logically different form of implementing complex 
Boolean logic computations that reduces wiring constraints thanks 
toa redundant distribution of the desired output among engineered 
cells. A practical implementation is presented using a library of 
engineered yeast cells, which can be combined in multiple ways. 
Each construct defines a logic function and combining cells and 
their connections allow building more complex synthetic devices. 
As a proof of principle, we have implemented many logic functions 
by using just a few engineered cells. Of note, small modifications 
and combination of those cells allowed for implementing more 
complex circuits such as a multiplexer or a 1-bit adder with carry, 
showing the great potential for re-utilization of small parts of the 
circuit. Our results support the approach of using cellular consortia 
as an efficient way of engineering complex tasks not easily solvable 
using single-cell implementations. 

Engineered living cells have been designed to perform a broad variety 
of functions*’* but, with few exceptions, complex computational con- 
structs (such as comparators, bit adders or multiplexers) are difficult to 
obtain and reuse’’. Moreover, cell-cell communication requirements 
rapidly grow with circuit complexity, thus limiting the combinatorial 
potential of the constructs. One way of overcoming these difficulties is 
to use cellular consortia’* based on the idea that external communica- 
tion between cells in populations involving either single’? or mul- 
tiple** ** cell types would perform functions difficult to be implemented 
using individual strains. Here, we apply this view to a novel distributed 
approach based on a reusable, sparse design of synthetic circuits. 

A small library of engineered cell types with restricted connections 
among them was generated, each cell responding to one/two inputs 
(Fig. la—c). The basic two-input and one-output engineered functions 
include the AND and the inverted IMPLIES (N-IMPLIES, Fig. 1d), 
which allow implementing any Boolean function. Moreover, some 
cells define one-input, one-output function (Fig. le). The output of 
each cell type is either a diffusible wiring molecule or the desired out- 
put. In contrast with previous works using synthetic consortia'*”’, we 
have not used cell-cell feedbacks. Instead, cells only respond to an 
external input and to a single diffusible molecule acting as a wire. 

The computation is determined by: (1) the number of cells Cinvolved, 
(2) the specific function implemented by each engineered cell and (3) the 
location of cells within the network (see Supplementary Information 
and Supplementary Fig. 1 for details). Crucially, we allow different engi- 
neered cells to produce the output signal, which is thus distributed. 


Moreover, each cell can be modulated by external inputs, which can 
either trigger the production of a signal or its inhibition. 

The combinatorial nature of our approach is highlighted by calculat- 
ing, for each C, the number of functions that can be implemented’. We 
have analysed all possible functions with two and three inputs versus C 
with our approach (see Supplementary Information and Supplemen- 
tary Fig. 2) and found that most can be constructed using C = 2-5 
different cells. For instance, in response to three inputs, just three cells 
results in more than 100 functions and exceed 200 using four cell types. 
The number of extracellular wires using this approach is significantly 
lower compared to other standard approximations (Supplementary 
Fig. 3 and Supplementary Information). With three inputs, over 100 
different logical functions can be achieved with only two wires and 
almost all are obtained with just three to four wires (Supplementary 
Fig. 3). 

As a proof of principle that distributed computation can be imple- 
mented in vivo, we created a library of engineered yeast cells. Each cell 
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Figure 1 | Basic engineered cells (cell types). a, b, Cells can receive signals 
from other cells (IN) and external sources (E) (a) or just from external sources 
(b). Cells can also produce diffusive output molecules. c, Representation of the 
cell behaviour is summarized, where each cell Cij responds to two different 
inputs; external input (xj) and a signalling molecule (wire) from another cell 
(a; — 1). The response of k-th cell type o(k;i,j) can be the production of a new 
wiring molecule («;) or the final output. The k-th cells respond to the presence 
of signals through some Boolean function © ij with k = {1,2,3,4} defining the 
resulting Boolean output. d, e, The four basic functions implemented in our 
study are displayed. 
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responds to an extracellular stimulus (for example, NaCl, doxycycline, 
galactose, oestradiol) and/or the presence of a wiring molecule (for 
example, yeast pheromone). The output of the cells was monitored as 
the expression of a reporter construct under the control of the FUS1 
promoter (for example, green fluorescent protein, GFP). See Sup- 
plementary Fig. 4 for relevant genotype and the logic function of each 
cell of the library is. The ability of cells to respond to external stimuli 
(inputs) was monitored by fluorescence in single cell (fluorescence- 
activated cell sorting, FACS) and normalized to the maximal number of 
cells able to produce output signal (see Fig. 2a and Supplementary 
Information). Each cell type has been characterized by its ability to 
respond to the corresponding stimuli (Fig. 2b and Supplementary 
Fig. 5). 

We then implemented all standard 2-input logic functions by com- 
bining just a few engineered cell types. We initially designed a basic 
circuit with an AND logic (Fig. 3a) involving two cell types responding 
to two stimuli (NaCl and oestradiol) and using a pheromone (alpha 
factor) as a wiring molecule. The presence of NaCl stimulates Cell 1 to 
produce pheromone (IDENTITY) that is received by Cell 2. In addi- 
tion, Cell 2 has the ability to sense another external input (oestradiol) 
and it is competent, via the production and activation of the Fus3 
mitogen-activated protein kinase (MAPK), to produce the final out- 
put. Only in the presence of the two inputs the final outcome was 
produced (Fig. 3a). Similarly, a NOR gate was implemented using a 


different pair of cell types in which each cell responded to a particular 
stimulus (doxycycline and 6a, an inhibitor of Fus3as kinase) with yeast 
pheromone as a wiring. Only in the absence of both stimuli there was 
positive output (Fig. 3b). 

Next, we designed two completely different circuits that involved 
the use of three independent engineered cell types, by reusing cells 
from our previous AND and NOR circuits. The first three-cell circuit 
was an OR logic gate in which the two inputs are NaCl and galactose. In 
this circuit, engineered Cell 1 and 5 are IDENTITY functions. They 
respond to the presence of NaCl (input 1) or galactose (input 2) to 
produce the wiring molecule that induces output production in Cell 6 
(GFP). The presence of any input (galactose or NaCl) generated a 
positive output as it corresponds to an OR gate (Fig. 3c). Similarly, a 
NAND gate was designed using doxycycline and glucose as inputs. 
Cell 3 and Cell5 display NOT logic. Both secreted pheromone in the 
absence of stimuli. Cell 6 responded to the presence of pheromone 
from either Cell 3 or Cell 5 inducing a fluorescent output. As expected, 
only the presence of both stimuli generates the output (Fig. 3d). This 
illustrates how to increase computational complexity at low cost. 
Other circuits can be easily built through reuse (Supplementary Fig. 
6). Of note, the N-IMPLIES circuit can be implemented in a single cell 
(Supplementary Fig. 6a) or by combining cells with different logics 
(Supplementary Fig. 6b). Using other consortia, we obtained the AND, 
NOR, OR, NAND, XNOR and XOR gates. However, they can be 
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Figure 2 | In vivo analyses of engineered cells. a, Quantification of single cell 
computational output. Truth table and schematic representation of a cell with a 
NOT logic (see Supplementary Information for complete genotype). The NOT 
function is implemented in Cell 3, and the reporter cell (Cell 6) is used to 
quantify alpha factor production in vivo. Doxycycline (DOX) was added as 
indicated and cells were analysed by FACS. Data are expressed as the 
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percentage of GFP-positive cells versus cells treated with pheromone. Results 
represent the mean = s.d. of three independent experiments. b, Transfer 
functions of basic logic cells. Schematic representation of cells implementing 
N-IMPLIES, AND, IDENTITY and NOT functions. Indicated cells were 
treated with indicated input concentrations (2 inputs, left; 1 input, right). 17B- 
E,, oestradiol; GLU, glucose. 
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Figure 3 | Engineered cells to implement different logic gates in vivo. 

a, Truth table and schematic representation of cells in the AND circuit (see 
Supplementary Information for complete genotype). Cells were mixed 
proportionally and inputs (NaCl and oestradiol) were added at the same time. 
b, Panel ordered as in (a) following NOR logic. Indicated cells were treated 


implemented using the same inputs for all of them (that is, doxycycline 
and glucose) (Supplementary Fig. 7a-f). This supports that our 
approach is adaptable and that multiple functions can be constructed 
from a small library of reusable cells. 

We then analysed the long-term dynamical response of our AND 
circuit under changing inputs (Supplementary Fig. 7a). We have found 
that once the circuit is turned on, it can maintain maximal signal for 
periods beyond 9h in the presence of stimuli (Supplementary Fig. 8a). 
Furthermore, once the system has been established, it responds equally 
well (at least eight generations) while the culture is maintained to a log 
phase (Supplementary Fig. 8b). We have also experimentally addressed 
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using as inputs doxycycline and 6a. c, OR gate. Indicated cells were treated 
using as inputs 0.4 M NaCl and 2% galactose (GAL). d, NAND gate. Indicated 
strains were treated using as inputs doxycycline and 2% glucose. Data represent 
the mean and standard deviation of three independent experiments. 


network responsiveness to dynamic changes by means of a microfluidic 
device containing the cellular types of the AND circuit and have 
exposed it to changes in the input signals over time. The system was 
able to dynamically respond to re-stimulation after GFP inactivation 
(Supplementary Fig. 8c). 

Our system can be selectively switched off and partially repro- 
grammed. For instance, the inhibition of the intracellular signal trans- 
duction in Cell2 of the AND gate, blocks the positive outcome of the 
circuit (Supplementary Fig. 9a). More interestingly, when reprogram- 
ming is applied to complex multicellular circuits, different computations 
can be obtained with little effort. For instance, when a reprogramming 


00 MONTH 2010 | VOL 000 | NATURE | 3 


©2010 Macmillan Publishers Limited. All rights reserved 


LETTER 


molecule (glucose) is added to the OR gate shown in Supplementary Fig. 
9b it works as IDENTITY for NaCl (input 2). In this context, despite the 
fact that multicellular circuits might seem more complex, their easy 
reuse and combination actually makes them more appropriate in many 
situations. 

Finally, we were able to engineer complex circuits by re-using our 
previous designs. One of them is the multiplexer MUX2tol circuit that 
selects one of different input signals and forwards the selected input 
into a single output. This circuit, if designed in a single cell would be 
difficult to implement in vivo (see Supplementary Fig. 10a). However, 
using distributed computation, the circuit can be assembled from just 
three engineered cell types responding to three input signals and a 
single wiring molecule (Supplementary Fig. 10b). In addition, we also 
implemented a MUX2tol circuit that contains four cell types but uses 
two independent wiring molecules («-factor from Saccharomyces 
cerevisiae and the o-factor from Candida albicans). Cell10 and 
Cell 13 respond to doxycycline and produce each one of the wiring 


molecules. Cell 12 responds to oestradiol and S. cerevisiae pheromone 
whereas Cell 15 responds to galactose and C. albicans pheromone. The 
final output (GFP) is generated by Cell 12 and Cell 15. Here, although 
the complexity of the circuit required a differential output to eight 
different input combinations, the in vivo results clearly showed that 
the computation of the three inputs yielded the expected response 
(Fig. 4a). A second complex circuit, the 1-bit adder with carry, was 
built by combining XOR and AND gates that respond to the same 
input (doxycycline and glucose) with two wiring molecules («%-factor 
from S. cerevisiae and C. albicans). In addition, output cells express 
different reporter proteins, a green reporter (adder) or a red reporter 
(carry) (FUS1::GFP or FUS1::mCherry respectively) allowing to detect 
the outcome of the carry and adder in the same culture. The system 
responds as an XOR gate (green columns) but presence of the two 
stimuli induces led to a 1-bit carrier (red columns) (Fig. 4b). 
Possible applications as well as some caveats will need to be 


addressed in future work (such as scalability>”®, strategies for reducing 
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Figure 4 | Design and in vivo implementation of a multiplexer (MUX2tol1) 
and 1-bit adder with carry. a, Truth table and schematic representation of the 
cells used in the MUX2tol. Indicated cells were treated using doxycycline 
(selector) and the inputs oestradiol and/or 2% galactose. Data are expressed as 
the percentage of GFP-positive cells using a sample treated with either S. 
cerevisiae or C. albicans alpha factor as a reference for Cell 12 or Cell 15, 
respectively. b, Truth table and schematic representation of cells used for 1-bit 
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adder with carry. Four cells with two wiring systems that respond to glucose and 
doxycycline with an XOR logic were combined with an extra cell that respond 
to same stimuli but with an AND logic in which instead of GFP, mCherry was 
expressed as output. The final outcome was measured as in Fig. 3a. Green bars 
indicate the adder output (GFP) whereas red bars represent the carry bit 
(mCherry). GFP and mCherry images of cells are shown (right panels). Data 
represent the mean and standard deviation of three independent experiments. 
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potential crosstalk* and robustness to noise’”). However, the results 
reported here show that distributed computation using consortia is a 
powerful strategy to build complex synthetic constructs, opening a 
new door to reusable, reprogrammable complex circuits (Supplemen- 
tary Fig. 11). 


METHODS SUMMARY 


Complete list of engineered yeast strains and plasmids is described in Supplemen- 
tary Information. Computational output detection was done in single cells by flow 
cytometry (FACScalibur, Becton Dickinson) and the dynamical inputs responses 
to different circuits were analysed in a microscopy based microfluidic platform. 
See Supplementary Information for details about experimental methods. 
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Images of a fourth planet orbiting HR 8799 


Christian Marois’, B. Zuckerman”, Quinn M. Konopacky’, Bruce Macintosh® & Travis Barman* 


High-contrast near-infrared imaging of the nearby star HR 8799 
has shown three giant planets’. Such images were possible because 
of the wide orbits (>25 astronomical units, where 1 AU is the Earth- 
Sun distance) and youth (<100 Myr) of the imaged planets, which 
are still hot and bright as they radiate away gravitational energy 
acquired during their formation. An important area of contention 
in the exoplanet community is whether outer planets (>10 AU) more 
massive than Jupiter form by way of one-step gravitational instabil- 
ities” or, rather, through a two-step process involving accretion of a 
core followed by accumulation of a massive outer envelope com- 
posed primarily of hydrogen and helium’. Here we report the pres- 
ence ofa fourth planet, interior to and of about the same mass as the 
other three. The system, with this additional planet, represents a 
challenge for current planet formation models as none of them can 
explain the in situ formation of all four planets. With its four young 
giant planets and known cold/warm debris belts*, the HR 8799 
planetary system is a unique laboratory in which to study the forma- 
tion and evolution of giant planets at wide (>10 AU) separations. 
New near-infrared observations of HR 8799, optimized for detecting 
close-in planets, were made at the Keck II telescope in 2009 and 2010. 
(See Table 1 for a summary.) A subset of the images is presented in 
Fig. 1. A fourth planet, designated HR 8799¢, is detected at six different 
epochs at an averaged projected separation of 0.368” + 0.003” 
(14.5 + 0.4 Av). Planet e is bound to the star and is orbiting anticlock- 
wise (see Fig. 2), as are the three other known planets in the system. The 
measured orbital motion, 46 + 10 mas yr‘, is consistent witha roughly 
circular orbit of semimajor axis (a) 14.5 au with a ~50-year period. 
Knowledge of the age and luminosity of the planets is critical for 
deriving their fundamental properties, including mass. In 2008 we 
used various techniques to estimate an age of 60 Myr with a plausible 


Table 1 | HR 8799e astrometry, photometry and physical 
characteristics 


Separation [E, N] from the host star 


[-0.299", -0.217”] 
[-0.303", —0.209"] 
[-0.304", -0.196"] 
[-0.325", -0.173"] 
[-0.324", -0.175"] 
[-0.334", -0.162"] 


Epoch, band, wavelength 


2009 Jul. 31, Kp band 2.124 um (+0.019") 
2009 Aug. 1, L’ band 3.776 um (+0.013") 
2009 Nov. 1, L’ band 3.776 um (+0.010") 
2010 Jul. 13, Ks band 2.146 um (+0.008") 
2010 Jul. 21, L’ band 3.776 wm (+0.011") 
2010 Oct. 30, L’ band 3.776 um (+0.010") 


Parameter Value 
Projected separation, avg. from all epochs* (Au) 14.5+0.4 
Orbital motion (arcsec yr +) 0.046 + 0.010 
Period for a face-on circular orbit (yr) ~50 

AKs 2.146 umt (mag) 0.67 + 0.22 
AL’ 3.776 um*t (mag) 9.37 £0.12 
Absolute magnitude at 2.146 um, Mx. (mag) 2.93 + 0.22 
Absolute magnitude at 3.776 um, M.: (mag) 1.61+0.12 
Luminosity (log Lj) —-4,7+0.2 
Mass for 3079 Myr (Myup) iat 

Mass for 60*3°° Myr (Mjup) Ons 


*The projected separation error (in Au) also accounts for the uncertainty in the distance to the star. 
Planet-to-star flux ratios, expressed as difference of magnitude. No reliable photometry was derived 
for the Kp-band 2009 Jul. 31 data. 


range between 30 and 160 Myr (here we represent this as 60*43)° Myr), 
consistent with an earlier estimate of 20-150 Myr (ref. 5). Two recent 
analyses (R. Doyon ef al., and B. Zuckerman et al., manuscripts in 
preparation) independently deduce that HR 8799 is very likely to be 
a member of the 30 Myr Columba association®. This conclusion is 
based on common Galactic space motions and age indicators for stars 
located between the previously-known Columba members and HR 
8799. The younger age suggests smaller planet masses, but to be con- 
servative, we use both age ranges (30779 Myr (Columba association) 
and 60+ 39° Myr') to derive the physical properties of planet e. 


a 21 July 2010, L’ band 


b 13 July 2010, Ks band 


c¢ 1 November 2009, L’ band 


Figure 1 | HR 8799e discovery images. Images of HR 8799 (a star at 

39.4 + 1.0 pc and located in the Pegasus constellation) were acquired at the 
Keck II telescope with the Angular Differential Imaging technique (ADI)” to 
allow a stable quasi-static point spread function (PSF) while leaving the field-of- 
view to rotate with time while tracking the star in the sky. The ADI/LOCI’*”? 
SOSIE software” was used to subtract the stellar flux, and to combine and flux- 
calibrate the images. Our SOSIE software” iteratively fits the planet PSF to 
derive relative astrometry and photometry (the star position and its 
photometry were obtained from unsaturated data or from its PSF core that was 
detectable through a flux-calibrated focal plane mask). a, An L'-band image 
acquired on 21 July 2010; b, a Ks-band image acquired on 13 July 2010 (arrows 
ina andb point towards planet e); c, an L’-band image acquired on 1 November 
2009. All three sequences were ~1h long. No coronagraphic focal plane mask 
was used on 1 November 2009, but a 400-mas-diameter mask was used on 13 
July and 21 July 2010. HR 8799¢ is located southwest of the star. Planets b, c and 
d are seen at respective projected separations of 68, 38 and 24 au from the 
central star, consistent with roughly circular orbits at inclinations of <40° (refs 
11-13). Their masses (7, 10 and 10 Mju, for b, c and d for 60 Myr age’; 5, 7 and 
7 Myup for 30 Myr age) were estimated from their luminosities using age- 
dependent evolutionary models”. North is up and east is left. 
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Figure 2 | HR 8799e 2009-10 astrometry. Main figure, the 2009-10 orbital 
motions of the four planets—b, c, d, e. Crosses denote the positions for 2009 and 
2010 first an last epochs for b, c and d, and for all six epochs for e. A square is 
drawn over the cross symbol of each planet’s first epoch. Inset, a zoomed 

version of planet e’s astrometry, including the expected motion (curved solid 
line) if it is an unrelated background object; each epoch is labelled by a number 
1-6; a dashed line connects the star to each epoch data point; error bars, + 1s.d. 


HR 8799¢e is located very near planets c and d in a K, versus K, — L’ 
colour-magnitude diagram, suggesting that all three planets have sim- 
ilar spectral shapes and bolometric luminosities. We, therefore, adopt 
the same luminosity for these three planets; however, given the larger 
photometric error-bars and sparse wavelength coverage associated 
with planet e, we have conservatively assigned to it a luminosity error 
(Table 1) twice as large as those for planets b, c and d’. This luminosity 
estimate is consistent with empirically calibrated bolometric correc- 
tions for brown dwarfs’, although such corrections may be ill-suited 
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Planet e is confirmed as bound to HR 8799, and it is moving at 

46 + 10masyr_' anticlockwise. In the main figure, the orbits of the giant 
planets of our Solar System (Jupiter, Saturn, Uranus and Neptune) are drawn to 
scale (light grey circles). With a period of ~50 years, the orbit of HR 8799e will 
be rapidly constrained by future observations; at our current measurement 
accuracy, it will be possible to measure orbital curvature after only 2 years. 


for young planets with distinct spectra and colours. Using the two 
overlapping age ranges outlined above and the evolutionary models 
described in the HR 8799bcd discovery article’, we estimate the mass of 
planet e to be 7} Myup (30 Myr) and 10+3 Myup (60 Myr), where Myup 
is the mass of Jupiter; see Fig. 3. The broadband photometry of planets 
b, c, and d provide strong evidence for significant atmospheric cloud 
coverage, while recent spectroscopy of planets b and c show evidence 
for non-equilibrium CO/CH, chemistry* "°. Given the limited wave- 
length coverage of the discovery images for planet e, it is too early to 


Figure 3 | The mass of HR 8799e from the age-luminosity relationship. 
Solid lines are luminosity-versus-age tracks for planet evolution models” 
(luminosities are normalized to the solar luminosity, L>). Objects above 13 
Myup are typically considered to be outside the planet-mass regime; however, 
the tail end of the planet distribution found by radial velocity surveys extends 
above this [AU-defined mass limit”®. Boxed areas show adopted luminosity 
ranges (+1s.d.) and estimated age ranges for the four HR 8799 planets: cross- 
hatched boxes show age range 30 *7) Myr; grey boxes show age range 

60130" Myr; planets c, d and e have similar luminosities, but the luminosity 
uncertainty for e is larger and indicated by the darker box/opposite hatch. For 
comparison, the ages and luminosities of four recently imaged planet-mass 
companions near other stars are indicated (numbered 1-4; see key on figure) 
showing 1s.d. error bars for the luminosity and estimated age ranges). An 
asteroseismology study suggested that the HR 8799 system might be as old as 
~1Gyr (ref. 27), but it is highly unlikely that such an old star would have very 
massive debris belts*'”*; such an age would also require planetary masses far too 
high for long-term stability'’. The older age also requires an inclination of the 
stellar pole relative to the line of sight of ~50°, inconsistent with the nearly face- 
on planetary system and the ~25° inclination upper limit measured from 
Spitzer images of the outer dust halo*. Mass estimates based on any existing 
evolutionary model at ages as young as 20-30 Myr suffer from unconstrained 
initial formation conditions; the masses presented here could be 
underestimated if the planets formed by core-accretion, though ‘cold start’ 
core-accretion models” do not reproduce the observed luminosity for any 
combination of mass and age. While this additional uncertainty can lead 
temporarily to ambiguity about the planets’ masses and formation history 
(core-accretion or gravitation instability), it does highlight the importance of 
discovering and following in orbit planet-mass companions at ages when 
formation processes are important. 
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Figure 4 | Comparison of HR 8799 and our Solar System. Top, Solar System; 
bottom, the HR 8799 system. HR 8799 infrared data indicate the existence of an 
asteroid belt analogue located at 6-15 Au (we have moved the estimated outer 
edge of this belt to 10 au because of planet e’s estimated chaotic region”), a 
Edgeworth-Kuiper-belt-like debris disk at >90 au and a small particle halo 
extending up to 1,000 au (ref. 4). The red shaded regions represent the locations 
of the inner and outer debris belts in both systems (the Solar System Oort comet 
cloud and the HR 8799 halo are not shown). The horizontal axis of the HR 8799 
plot is compressed by the square root of the ratio of the luminosity of HR 8799 
(4.92 + 0.41 Lo) to that of the Sun to show both systems over the same 
equilibrium temperature range. Given the current apparent separations of the 


say much about the atmospheric properties of this particular planet; 
however, given that its near-infrared colour is similar to those of the 
other three planets, we can anticipate similar cloud structure and 
chemistry for planet e. 

Stability analyses''"* have shown that the original three-planet sys- 
tem may be in a mean motion period resonance with an upper limit on 
planetary masses of ~20 Mjy,p assuming an age of up to 100 Myr. With 
the discovery of a fourth planet, we revisit the stability of this system. We 
searched for stable orbital configurations with the HYBRID/Mercury 
package" using the 30-Myr (5, 7, 7 and 7 Mjup for b, c, d and e respec- 
tively) and 60-Myr (7, 10, 10 and 10 Mj,,) masses. In our preliminary 
search, we held the parameters for b, c and d fixed to those matching 
either the single resonance (1:2 resonance between planets c and d only) 
or double resonance (1:2:4 resonance between planets b, c, and d) stable 
solutions found to date’’, but allowed the parameters for e to vary within 
the regime allowed by our observations. On the basis of the single- 
resonance configuration and using the 30-Myr masses, in 100,000 trials 
seven solutions for e were found that are stable for at least 160 Myr (the 
maximum estimated age of the system), and an additional five solutions 
were found that are stable for over 100 Myr. All maximally stable solu- 
tions have a semimajor axis of ~ 14.5 Au, with planets c, dand e ina 1:2:4 
resonance (planet b not in resonance). A set of 100,000 trials was also 
performed using the 60-Myr masses, but only two solutions were found 
that are stable for over 100 Myr, each of which requires a semimajor axis 
of ~12.5 au, 40 away from our astrometry. This is suggestive that a 
younger system age and lower planet masses are preferred, although a 
much more thorough search of parameter space is required (see the 
Supplementary Information for tables of stable solutions). 

The mechanism for the formation of this system is unclear. It is 
challenging for gravitational-instability fragmentation to occur at 
a< 20-40 au (refs 15,16)—ruling that mechanism out for in situ 


four planets of HR 8799 and the preferred locations of the inner warm debris 
disk and the inner edge of the outer cold disk (90 Au)’, then (1) the indicated 4:1 
and 2:1 period resonances between the inner/outer edge of the warm debris belt 
and planet e, and (2) a 3:2 mean motion resonance of b with the inner edge of 
the outer cold disk, are both consistent with the observations. By analogy, the 
inner and outer edges of the main asteroid belt of our Solar System are, 
respectively, in 4:1 and 2:1 mean motion resonances with Jupiter. Many 
members of the Edgeworth—-Kuiper belt, including Pluto, are in a 3:2 mean 
motion orbital resonance with Neptune. Solar System planet images are from 
NASA; HR 8799 artwork is from Gemini Observatory and L. Cook. Planet 
diameters are not to scale. 


formation of planet e. In addition, disk instability mechanisms pref- 
erentially form objects more massive than these planets’*”’. If the HR 
8799 system represented low-mass examples of such a population, 
brown-dwarf companions to young massive stars would be plentiful. 
Nearby young star surveys’*”° and our nearly complete survey of 80 
stars with similar masses and ages to HR 8799 have discovered no such 
population of brown dwarf companions. HR 8799e and possibly d are 
close enough to the primary star to have formed by bottom-up accre- 
tion in situ’’, but planets b and c are located where the collisional 
timescale is conventionally thought to be too low for core accretion 
to form giant planets before the system’s gas is depleted. A hybrid 
process with different planets forming through different mechanisms 
cannot be ruled out, but seems unlikely with the similar masses and 
dynamical properties of the four planets. It is possible that one mech- 
anism dominated the other and the planets later migrated to their 
current positions. The HR 8799 debris disk is especially massive for 
a star of its age (or for any older main sequence star*’), which could 
indicate an extremely dense protoplanetary disk. Such a disk could 
have induced significant migration, moving planets formed by disk- 
instability inward, or the disk could have damped the residual eccent- 
ricity from multi-planet gravitational interactions that moved core- 
accretion planets outward. The massive debris disk and the lack of 
higher-mass analogues to this system do suggest that HR 8799 repre- 
sents the high-mass end of planet formation. 

The HR 8799 system does show interesting similarities with our 
Solar System; all giant planets are located past the estimated snow line 
of each system (~2.7 Au for the Solar System, ~6 Au for HR 8799), and 
the debris belts of each system are located at similar equilibrium tem- 
peratures (Fig. 4). With its four massive planets, massive debris belts 
and large scale, the HR 8799 planetary system is an amazing example 
of extreme systems that can form around stars. 
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EXTRASOLAR PLANETS 


A giant surprise 


The discovery of an inner giant planet in the unusually massive solar system around the star HR 8799 creates an ensemble 
of planets that is difficult to explain with prevailing theories of planet formation. 


LAIRD CLOSE 


he solar system around the 
Te HR 8799 should not exist. 

This system is unlike any other 
known: it is a massive system that has 
multiple massive planets, with each 
giant planet containing many times 
the mass of all the planets in our Solar 
System combined. However, in a paper 
published online in Nature today, 
Marois and collaborators' present new 
images of HR 8799 in which yet another 
equally massive planet is visible. 

Previous work’ had imaged three 
planets around HR 8799, and now we 
have the surprise discovery ofa fourth, 
HR 8799e, an inner, massive planet 
(about 10 Jupiter masses) located some 
14.5 astronomical units from the star 
(1 au is the average distance from Earth 
to the Sun). One might question the 
importance of the discovery of another 
extrasolar planet when more than 500 
are known. But the HR 8799 system is 
the only solar system known to have 
multiple outer planets (the other three 
planets, HR 8799b, HR 8799c and HR 
8799d, orbit respectively at approxi- 
mately 68, 38 and 24 au from the host 
star, and have estimated masses of 
about 7, 10 and 10 Jupiters). 

As HR 8799 is the only known exam- 
ple of a wide (greater than 25 av) solar 
system with multiple giant planets, 
astronomers were curious to know whether 
the star’s planets could have formed by gravi- 
tational collapse’ — one of the most popular 
theories of outer-planet formation. This theory 
posits that outer giant planets form from the 
fragmentation of the disk of gas and dust that 
develops around stars when they are young. In 
a process rather like the way binary stars form, 
a gravitational instability in the disk fragments 
it and quickly (on a timescale of 10,000 years) 
leads to the formation of gas-giant planets’. 
But the discovery of an inner planet such as 
HR 8799e at 14.5 au poses a tricky puzzle. At 
this distance, the disk was neither cold enough 
nor rotating slowly enough to fragment and 
undergo gravitational collapse in situ to form 
HR 8799e". 

To explain the formation of this latest planet, 


Cold disk, rapid 
planet formation 
by gravitational 
collapse 


Warm disk, \ 


slow planet 
formation by 
core acGretion 


Jupiter’s orbit 
around the Sun 


(for scale) 


Figure 1 | The HR 8799 planetary system. When star HR 8799 
formed, a massive circumstellar disk of gas and dust probably 
existed from which the star’s four massive planets formed; the 
planets’ approximate current orbits are overlaid and labelled b-e. 
The outer part of the disk was very cold and rotated slowly, and so 
might have collapsed through gravitational instabilities to quickly 
form outer planets such as ‘b. The newly discovered © planet’ is ina 
very different zone, where the disk was much warmer and the planet 
is likely to have formed in a slow, two-step ‘core-accretiom process. 
Neither theory of planet formation — gravitational collapse or core 
accretion — can explain the whole family of four planets. 


Marois et al.’ appeal to the dominant theory 
of giant-planet formation: a slower process 
than gravitational collapse (about 3.5 million 
years at a distance of 10 av) in which solid 
dust grains conglomerate into solid cores of 
tens of Earth masses and then gravitationally 
accrete disk gas to grow to Jupiter masses. 
Such a ‘core-accretion’ process itself is only 
marginally fast enough at 14.5 au to build up 
HR 8799's roughly 10 Jupiter masses before 
the disk gas accretes onto the star in less than 
10 Myr. This formation timescale problem’ 
becomes even more vexing if one considers 
that, at about 2.6 times the distance HR 8799e 
is from the host star, HR 8799c would require 
about 20 times longer (more than about 200 
Myr) to grow to the same mass at 38 au — 
long after the disk has lost all its gas. What's 
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more, at 68 au, HR 8799b’s forma- 
tion is truly problematic, requiring 
an even longer timescale (many times 
the age of the star) to have formed 
in situ by core accretion. Hence, nei- 
ther of the two favoured theories of 
giant-planet formation can explain 
how all the planets around HR 8799 
formed: HR 8799¢ is too close to have 
formed by gravitational collapse, and 
HR 8799c and HR 8799b are too far 
out to have formed by core accretion 
(Fig. 1). 

Perhaps all of these massive 
planets formed at much larger dis- 
tances (more than at least 50 au) 
by the gravitational collapse of an 
unusually massive disk and then 
migrated quickly inwards to their cur- 
rent positions, somehow sweeping into 
a dynamically stable set of 1:2:4 orbital 
resonances’ (where, for every one orbit 
of planet c, there are two of d and four 
of e). This does not really help the situ- 
ation, however, because it is unlikely 
that such a massive planet as HR 8799e 
could have migrated from about 50 to 
14.5 au by means of tidal torques from 
the residual gas that had not been used 
to build up the planets. The converse 
theory, by which the planets all form 
through core accretion within about 
10 au and then slowly move outwards 
by scattering lesser objects (planetesi- 
mals) inwards, is also problematic 
because there is probably too limited a reser- 
voir of planetesimals to move a 7-Jupiter-mass 
object such as HR 8799b outwards some 58 Au. 
So, despite having a clear view of the system — 
thanks to the power of adaptive-optics sys- 
tems and large ground-based telescopes — we 
cannot currently explain how all four planets 
formed in a coherent, coeval fashion. 

Akey strength of direct imaging is that pho- 
tons can be collected from these self-luminous 
young planets as they contract, allowing the 
planetary spectra to be observed (to calculate 
temperatures and luminosities). The observed 
brightness of HR 8799b in direct images is 
much lower than would be expected from 
its observed temperature, given that evolu- 
tionary models indicate that HR 8799b must 


have a radius larger than that of Jupiter’*”. 
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This ‘under-luminosity’ problem is typical of 
around half of the extrasolar planets imaged 
to date. One possible explanation is that dusty, 
thick, planetary-scale high-latitude cloud 
‘bands’ absorb/scatter light when viewing a 
young planet over its pole. For example, the 
‘under-luminous’ planets in the HR 8799 sys- 
tem are probably being viewed close to ‘pole- 
on’, perhaps leading to less light emitted in the 
direction of Earth. By contrast, ‘edge-on’ giant 
planets, such as B-Pictoris b®, look brighter 
because light streams freely from the brighter 
equatorial regions between the dark cloud 
bands. Clearly, further theoretical (and direct 
imaging) work will be needed to identify the 
ultimate cause of this under-luminosity prob- 
lem. 

The future holds much promise for more 
surprises in the field of direct imaging of 
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extrasolar planets. However, it seems unlikely 
that any other massive outer planets will be 
found around HR 8799’. There is always a 
chance, though, that low-mass terrestrial plan- 
ets lie within the star’s 10-au-radius ‘asteroid’ 
belt. The next chapter in this story will soon 
be written by even more powerful ground- 
based, adaptive-optics imagers*” and, let us 
hope, by more powerful pathfinding, space- 
based planet- and disk-imaging telescopes”. 
These pathfinders should eventually lead to 
a terrestrial-planet-finding telescope even 
capable of taking spectra of Earth-like plan- 
ets. Such an achievement could address one 
of the most pivotal questions in science: how 
common are truly Earth-like planets and life in 
our Universe? = 
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